Motivation:Leishmania infantum is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from Azadirachta indica phytochemicals using molecular modeling techniques. Results: Sixty compounds from A. indica were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. Availability and Implementation: This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further in vitro and in vivo evaluation for the development of TR inhibitors against L. infantum.
{"title":"Computational identification of <i>Azadirachta indica</i> compounds targeting trypanothione reductase in <i>Leishmania infantum</i>.","authors":"Onile Olugbenga Samson, Olukunle Samuel, Fadahunsi Adeyinka Ignatius, Onile Tolulope Adelonpe, Momoh Abdul, Kolawole Oladipo, Afolabi Titilope Esther, Raji Omotara, Hassan Nour, Samir Chtita","doi":"10.1093/bioadv/vbaf318","DOIUrl":"10.1093/bioadv/vbaf318","url":null,"abstract":"<p><p><b>Motivation:</b> <i>Leishmania infantum</i> is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from <i>Azadirachta indica</i> phytochemicals using molecular modeling techniques. <b>Results:</b> Sixty compounds from <i>A. indica</i> were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. <b>Availability and Implementation:</b> This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further <i>in vitro</i> and <i>in vivo</i> evaluation for the development of TR inhibitors against <i>L. infantum</i>.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf318"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf301
Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux
Summary: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.
{"title":"Perspectives in computational mass spectrometry: recent developments and key challenges.","authors":"Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux","doi":"10.1093/bioadv/vbaf301","DOIUrl":"10.1093/bioadv/vbaf301","url":null,"abstract":"<p><p><b>Summary</b>: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf301"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Drug-target interaction (DTI) prediction accelerates drug discovery by identifying interactions between chemical compounds and proteins. Existing methods often rely on drug-drug and protein-protein similarity graphs but process them independently, limiting their ability to model interdependencies between modalities. Moving beyond isolated embedding generation from protein and drug graphs, we propose DCGAT-DTI, a novel deep learning framework with a dynamic cross-graph attention (DCGAT) module that dynamically models intra- and cross-graph interactions. Initial embeddings are generated using pretrained language models. Similarity graphs constructed from these embeddings are passed to DCGAT, which uses a Graph Convolutional Network-based Cross-Neighborhood Selection network to dynamically select cross-modal neighbors. This allows drug and protein embeddings to incorporate information from both modalities through intra- and cross-graph attention mechanisms.
Results: Extensive evaluations on four benchmark datasets demonstrate that DCGAT-DTI outperforms state-of-the-art methods across warm and cold start splits for both balanced and unbalanced datasets. In the challenging unbalanced cold start scenarios, it achieves significant improvement in performance for both drugs and proteins over the baselines.
Availability and implementation: Source code is available at https://github.com/compbiolabucf/DCGAT-DTI.
{"title":"DCGAT-DTI: dynamic cross-graph attention network for drug-target interaction prediction.","authors":"Abrar Rahman Abir, Muhtasim Noor Alif, Wencai Zhang, Khandakar Tanvir Ahmed, Wei Zhang","doi":"10.1093/bioadv/vbaf306","DOIUrl":"10.1093/bioadv/vbaf306","url":null,"abstract":"<p><strong>Motivation: </strong>Drug-target interaction (DTI) prediction accelerates drug discovery by identifying interactions between chemical compounds and proteins. Existing methods often rely on drug-drug and protein-protein similarity graphs but process them independently, limiting their ability to model interdependencies between modalities. Moving beyond isolated embedding generation from protein and drug graphs, we propose DCGAT-DTI, a novel deep learning framework with a dynamic cross-graph attention (DCGAT) module that dynamically models intra- and cross-graph interactions. Initial embeddings are generated using pretrained language models. Similarity graphs constructed from these embeddings are passed to DCGAT, which uses a Graph Convolutional Network-based Cross-Neighborhood Selection network to dynamically select cross-modal neighbors. This allows drug and protein embeddings to incorporate information from both modalities through intra- and cross-graph attention mechanisms.</p><p><strong>Results: </strong>Extensive evaluations on four benchmark datasets demonstrate that DCGAT-DTI outperforms state-of-the-art methods across warm and cold start splits for both balanced and unbalanced datasets. In the challenging unbalanced cold start scenarios, it achieves significant improvement in performance for both drugs and proteins over the baselines.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/compbiolabucf/DCGAT-DTI.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf306"},"PeriodicalIF":2.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-14eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf316
Nidhi Pai, J Sunil Rao
Motivation: Understanding the role of DNA methylation in oncogenesis, diagnosis, and treatment requires data sufficient in size and accuracy, but current epigenetic data is limited, especially for population groups underrepresented in research. We propose a framework for generating highly accurate DNA methylation predictions using classified mixed model prediction, incorporating a step to cluster patients into cross-cancer and cross-race groups.
Results: Simulations show our framework more accurately predicts underlying mixed effects compared to regression prediction and naive estimates, extending previous work to the case where clusters are estimated from the data. We illustrate this framework using data from The Cancer Genome Atlas, uncovering clustering patterns and generating DNA methylation predictions for further analysis. Our work demonstrates how shared random effects can be leveraged to borrow strength across observations with similar methylation patterns.
Availability and implementation: The methods are implemented in R and available at: https://github.com/nidhipai/dnam_cmmp.
{"title":"Data-based clustering in prediction of cervical cancer DNA methylation using pan-cancer genetic and clinical data.","authors":"Nidhi Pai, J Sunil Rao","doi":"10.1093/bioadv/vbaf316","DOIUrl":"10.1093/bioadv/vbaf316","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the role of DNA methylation in oncogenesis, diagnosis, and treatment requires data sufficient in size and accuracy, but current epigenetic data is limited, especially for population groups underrepresented in research. We propose a framework for generating highly accurate DNA methylation predictions using classified mixed model prediction, incorporating a step to cluster patients into cross-cancer and cross-race groups.</p><p><strong>Results: </strong>Simulations show our framework more accurately predicts underlying mixed effects compared to regression prediction and naive estimates, extending previous work to the case where clusters are estimated from the data. We illustrate this framework using data from The Cancer Genome Atlas, uncovering clustering patterns and generating DNA methylation predictions for further analysis. Our work demonstrates how shared random effects can be leveraged to borrow strength across observations with similar methylation patterns.</p><p><strong>Availability and implementation: </strong>The methods are implemented in R and available at: https://github.com/nidhipai/dnam_cmmp.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf316"},"PeriodicalIF":2.8,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Many topics within the study of the origin and early evolution of life are amenable to computational research strategies. Over a decade ago, the original LUCApedia was developed in order to facilitate such research. Here we describe a massively overhauled LUCApedia database and web server.
Results: The database is composed of 17 different datasets based on previous studies or published hypotheses about the last universal common ancestor and its evolutionary predecessors. Similar to the original LUCApedia database, these datasets are mapped onto a common framework so that they can be corroborated with one another and used to examine continuity across different stages of early evolution.
Availability and implementation: The database can be searched, browsed, and downloaded from the LUCApedia web server, https://lucapedia.org/.
{"title":"A new LUCApedia database for data-driven research on early evolutionary history.","authors":"Zahra Nikfarjam, Ishaan Thota, Alireza Nikfarjam, Freya Kailing, Aaron D Goldman","doi":"10.1093/bioadv/vbaf309","DOIUrl":"10.1093/bioadv/vbaf309","url":null,"abstract":"<p><strong>Motivation: </strong>Many topics within the study of the origin and early evolution of life are amenable to computational research strategies. Over a decade ago, the original LUCApedia was developed in order to facilitate such research. Here we describe a massively overhauled LUCApedia database and web server.</p><p><strong>Results: </strong>The database is composed of 17 different datasets based on previous studies or published hypotheses about the last universal common ancestor and its evolutionary predecessors. Similar to the original LUCApedia database, these datasets are mapped onto a common framework so that they can be corroborated with one another and used to examine continuity across different stages of early evolution.</p><p><strong>Availability and implementation: </strong>The database can be searched, browsed, and downloaded from the LUCApedia web server, https://lucapedia.org/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf309"},"PeriodicalIF":2.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf312
Lapo Doni, Alessia Marotta, Luigi Vezzulli, Emanuele Bosi
Motivation: The revolution of next-generation sequencing has driven the establishment of metabarcoding as an efficient and cost-effective method for exploring community composition. Amplicon sequencing of taxonomic marker genes, such as the 16S rRNA gene in prokaryotes, provides an efficient method for high-throughput taxonomic profiling. The advent of long read technologies made it feasible to sequence the whole 16S rRNA gene rather than only a few regions, with the potential to achieve species-level resolution. Despite the affordability and scalability of such experiments, a major bottleneck remains the lack of integrated and user-friendly analytical workflows. Current pipelines often require the use of multiple tools with complex dependencies, and parameter optimization is frequently performed manually, limiting reproducibility and overall efficiency.
Results: To address these limitations, we developed, AmpWrap, an automated, one line workflow designed to analyse both Illumina and Nanopore amplicons, requiring minimal efforts by the user and automatically optimizing the trimming parameter to retain the maximum number of reads and information while reducing noise.
Availability and implementation: AmpWrap is available at: https://github.com/LDoni/AmpWrap.
{"title":"AmpWrap: a one-line fully automated amplicon metabarcoding 16S and 18S rRNA gene analysis.","authors":"Lapo Doni, Alessia Marotta, Luigi Vezzulli, Emanuele Bosi","doi":"10.1093/bioadv/vbaf312","DOIUrl":"10.1093/bioadv/vbaf312","url":null,"abstract":"<p><strong>Motivation: </strong>The revolution of next-generation sequencing has driven the establishment of metabarcoding as an efficient and cost-effective method for exploring community composition. Amplicon sequencing of taxonomic marker genes, such as the 16S rRNA gene in prokaryotes, provides an efficient method for high-throughput taxonomic profiling. The advent of long read technologies made it feasible to sequence the whole 16S rRNA gene rather than only a few regions, with the potential to achieve species-level resolution. Despite the affordability and scalability of such experiments, a major bottleneck remains the lack of integrated and user-friendly analytical workflows. Current pipelines often require the use of multiple tools with complex dependencies, and parameter optimization is frequently performed manually, limiting reproducibility and overall efficiency.</p><p><strong>Results: </strong>To address these limitations, we developed, AmpWrap, an automated, one line workflow designed to analyse both Illumina and Nanopore amplicons, requiring minimal efforts by the user and automatically optimizing the trimming parameter to retain the maximum number of reads and information while reducing noise.</p><p><strong>Availability and implementation: </strong>AmpWrap is available at: https://github.com/LDoni/AmpWrap.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf312"},"PeriodicalIF":2.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf311
Bernardo Velozo, Clara Carvalho, Rayssa Feitosa, Lucas Aleixo Leal Pedroza, Emerson Danzer, Sandy Ingrid Aguiar Alves, Maira Neves, Bibiana Fam
Motivation: Bioinformatics drives modern biological discovery, and Brazil has become an important contributor to genomics and computational biology. However, bioinformatics education across the country struggles to meet diverse regional and professional demands. To respond to these challenges, the Regional Student Group of Brazil created an Educational Committee in 2019 to expand Portuguese-language resources and evaluate national training needs. Here, we apply the Core Competency 3.0 framework to establish a seven-domain training model spanning foundational biological, statistical, and computational skills, ethical principles, applied bioinformatics practices, communication abilities, and continuous professional development.
Results: A nationwide survey of 375 respondents from more than 21 Brazilian states revealed pronounced geographic and career-based disparities in bioinformatics training. Individuals who primarily use bioinformatics tools, largely students, showed strong interest in phylogenetics and evolutionary analyses, while those focused on software and tool development prioritized computational methods. These findings demonstrate how educational needs differ across profiles and regions, emphasizing the importance of localized strategies to address Brazil's heterogeneous training landscape. Unlike broad competency frameworks, this data-driven approach identifies specific gaps and areas of high demand.
Availability and implementation: By integrating these insights, the Regional Student Group of Brazil proposes an equitable and scalable education model that supports curriculum development and helps strengthen training in regions with limited opportunities, offering a framework adaptable to global scientific communities facing similar socioeconomic challenges.
{"title":"Mapping educational needs in bioinformatics in Brazil: adapting ISCB 3.0 competencies to a regional context.","authors":"Bernardo Velozo, Clara Carvalho, Rayssa Feitosa, Lucas Aleixo Leal Pedroza, Emerson Danzer, Sandy Ingrid Aguiar Alves, Maira Neves, Bibiana Fam","doi":"10.1093/bioadv/vbaf311","DOIUrl":"10.1093/bioadv/vbaf311","url":null,"abstract":"<p><strong>Motivation: </strong>Bioinformatics drives modern biological discovery, and Brazil has become an important contributor to genomics and computational biology. However, bioinformatics education across the country struggles to meet diverse regional and professional demands. To respond to these challenges, the Regional Student Group of Brazil created an Educational Committee in 2019 to expand Portuguese-language resources and evaluate national training needs. Here, we apply the Core Competency 3.0 framework to establish a seven-domain training model spanning foundational biological, statistical, and computational skills, ethical principles, applied bioinformatics practices, communication abilities, and continuous professional development.</p><p><strong>Results: </strong>A nationwide survey of 375 respondents from more than 21 Brazilian states revealed pronounced geographic and career-based disparities in bioinformatics training. Individuals who primarily use bioinformatics tools, largely students, showed strong interest in phylogenetics and evolutionary analyses, while those focused on software and tool development prioritized computational methods. These findings demonstrate how educational needs differ across profiles and regions, emphasizing the importance of localized strategies to address Brazil's heterogeneous training landscape. Unlike broad competency frameworks, this data-driven approach identifies specific gaps and areas of high demand.</p><p><strong>Availability and implementation: </strong>By integrating these insights, the Regional Student Group of Brazil proposes an equitable and scalable education model that supports curriculum development and helps strengthen training in regions with limited opportunities, offering a framework adaptable to global scientific communities facing similar socioeconomic challenges.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf311"},"PeriodicalIF":2.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf310
James Urban, Roman Joeres, Daniel Bojar
Motivation: As the field of glycobiology has developed, so too have different glycan nomenclature systems. While each system serves specific purposes, this multiplicity creates challenges for usability, data integration, and knowledge sharing across different databases and computational tools.
Results: We present a practical framework for automated nomenclature conversion that takes any glycan nomenclature as input without requiring declaration of the specific language and outputs a canonicalized IUPAC-condensed format as a standardized representation. Our implementation handles all common nomenclatures including WURCS, GlycoCT, IUPAC-condensed/extended, GLYCAM, CSDB-linear, LinearCode, GlycoWorkbench, GlySeeker, Oxford, and KCF, along with common typos, and manages complex cases including structural ambiguities, modifications, uncertainty in linkage information, and different compositional representations. This Universal Input framework can translate more than 10 nomenclatures in <1 ms per glycan, tested on over 150 000 sequences with 98%-100% coverage, enabling seamless integration of existing glycan databases and tools while maintaining the specific advantages of each representation system.
Availability and implementation: Universal Input is implemented within the glycowork Python package, available at https://github.com/BojarLab/glycowork and our web app https://canonicalize.streamlit.app/.
{"title":"Bridging worlds: connecting glycan representations with glycoinformatics via Universal Input and a canonicalized nomenclature.","authors":"James Urban, Roman Joeres, Daniel Bojar","doi":"10.1093/bioadv/vbaf310","DOIUrl":"10.1093/bioadv/vbaf310","url":null,"abstract":"<p><strong>Motivation: </strong>As the field of glycobiology has developed, so too have different glycan nomenclature systems. While each system serves specific purposes, this multiplicity creates challenges for usability, data integration, and knowledge sharing across different databases and computational tools.</p><p><strong>Results: </strong>We present a practical framework for automated nomenclature conversion that takes any glycan nomenclature as input without requiring declaration of the specific language and outputs a canonicalized IUPAC-condensed format as a standardized representation. Our implementation handles all common nomenclatures including WURCS, GlycoCT, IUPAC-condensed/extended, GLYCAM, CSDB-linear, LinearCode, GlycoWorkbench, GlySeeker, Oxford, and KCF, along with common typos, and manages complex cases including structural ambiguities, modifications, uncertainty in linkage information, and different compositional representations. This Universal Input framework can translate more than 10 nomenclatures in <1 ms per glycan, tested on over 150 000 sequences with 98%-100% coverage, enabling seamless integration of existing glycan databases and tools while maintaining the specific advantages of each representation system.</p><p><strong>Availability and implementation: </strong>Universal Input is implemented within the glycowork Python package, available at https://github.com/BojarLab/glycowork and our web app https://canonicalize.streamlit.app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf310"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12702141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf307
Josh Dyce, Lea Rieskamp, Scott J Tebbutt, Bruce M McManus, Amrit Singh
Motivation: Machine learning offers a powerful approach for building predictive models from high-dimensional molecular data. Omics technologies such as transcriptomics, proteomics, and metabolomics quantify thousands of molecules simultaneously, providing deep insights into disease biology. Integrating multiple modalities can enhance predictive performance, as shown in histology-omics and holter-omics applications. To support streamlined, reproducible, and user-friendly multimodal analytics, we developed Omics BioAnalytics, an R Shiny platform for unified analysis, integration, and interpretation of diverse omics datasets.
Results: Omics BioAnalytics performs late integration using ensembles of elastic net models trained independently on each modality, with predictions averaged across datasets. The platform provides interactive dashboards for metadata exploration, exploratory analyses, differential expression, gene set analysis, and biomarker discovery. Results are visualized through dynamic plots and downloadable reports, ensuring transparent and reproducible workflows. A unique feature is the integrated multimodal Alexa Skill, which enables voice-based querying and rapid visualization. Together, these web and voice-enabled tools offer accessible and reproducible multimodal analytics for biomedical researchers, supporting the discovery of molecular signatures, predictive biomarkers, and therapeutic targets.
Availability and implementation: All source code, public datasets, video walkthroughs, and the deployed application are available at: https://github.com/CompBio-Lab/omicsBioAnalytics.
{"title":"Omics BioAnalytics: an RShiny application for multimodal biomarker panel discovery and assessment.","authors":"Josh Dyce, Lea Rieskamp, Scott J Tebbutt, Bruce M McManus, Amrit Singh","doi":"10.1093/bioadv/vbaf307","DOIUrl":"10.1093/bioadv/vbaf307","url":null,"abstract":"<p><strong>Motivation: </strong>Machine learning offers a powerful approach for building predictive models from high-dimensional molecular data. Omics technologies such as transcriptomics, proteomics, and metabolomics quantify thousands of molecules simultaneously, providing deep insights into disease biology. Integrating multiple modalities can enhance predictive performance, as shown in histology-omics and holter-omics applications. To support streamlined, reproducible, and user-friendly multimodal analytics, we developed Omics BioAnalytics, an R Shiny platform for unified analysis, integration, and interpretation of diverse omics datasets.</p><p><strong>Results: </strong>Omics BioAnalytics performs late integration using ensembles of elastic net models trained independently on each modality, with predictions averaged across datasets. The platform provides interactive dashboards for metadata exploration, exploratory analyses, differential expression, gene set analysis, and biomarker discovery. Results are visualized through dynamic plots and downloadable reports, ensuring transparent and reproducible workflows. A unique feature is the integrated multimodal Alexa Skill, which enables voice-based querying and rapid visualization. Together, these web and voice-enabled tools offer accessible and reproducible multimodal analytics for biomedical researchers, supporting the discovery of molecular signatures, predictive biomarkers, and therapeutic targets.</p><p><strong>Availability and implementation: </strong>All source code, public datasets, video walkthroughs, and the deployed application are available at: https://github.com/CompBio-Lab/omicsBioAnalytics.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf307"},"PeriodicalIF":2.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf305
Ali A Septiandri, Deyu Ming, Francisco Alejandro DiazDelaO, Takoua Jendoubi, Samiran Ray
Motivation: Healthcare data, particularly in critical care settings, presents three key challenges for analysis. First, physiological measurements come from different sources but are inherently related. Yet, traditional methods often treat each measurement type independently, losing valuable information about their relationships. Second, clinical measurements are collected at irregular intervals, and these sampling times can carry clinical meaning. Finally, the prevalence of missing values. Whilst several imputation methods exist to tackle this common problem, they often fail to address the temporal nature of the data or provide estimates of uncertainty in their predictions.
Results: We propose using deep Gaussian process emulation with stochastic imputation, a methodology initially conceived to deal with computationally expensive models and uncertainty quantification, to solve the problem of handling missing values that naturally occur in critical care data. This method leverages longitudinal and cross-sectional information and provides uncertainty estimation for the imputed values. Our evaluation of a clinical dataset shows that the proposed method performs better than conventional methods, such as multiple imputations with chained equations (MICE), last-known value imputation, and individually fitted Gaussian processes (GPs).
Availability and implementation: The source code of the experiments is freely available at: https://github.com/aliakbars/dgpsi-picu.
{"title":"Integrative analysis and imputation of multiple data streams via deep Gaussian processes.","authors":"Ali A Septiandri, Deyu Ming, Francisco Alejandro DiazDelaO, Takoua Jendoubi, Samiran Ray","doi":"10.1093/bioadv/vbaf305","DOIUrl":"10.1093/bioadv/vbaf305","url":null,"abstract":"<p><strong>Motivation: </strong>Healthcare data, particularly in critical care settings, presents three key challenges for analysis. First, physiological measurements come from different sources but are inherently related. Yet, traditional methods often treat each measurement type independently, losing valuable information about their relationships. Second, clinical measurements are collected at irregular intervals, and these sampling times can carry clinical meaning. Finally, the prevalence of missing values. Whilst several imputation methods exist to tackle this common problem, they often fail to address the temporal nature of the data or provide estimates of uncertainty in their predictions.</p><p><strong>Results: </strong>We propose using deep Gaussian process emulation with stochastic imputation, a methodology initially conceived to deal with computationally expensive models and uncertainty quantification, to solve the problem of handling missing values that naturally occur in critical care data. This method leverages longitudinal and cross-sectional information and provides uncertainty estimation for the imputed values. Our evaluation of a clinical dataset shows that the proposed method performs better than conventional methods, such as multiple imputations with chained equations (MICE), last-known value imputation, and individually fitted Gaussian processes (GPs).</p><p><strong>Availability and implementation: </strong>The source code of the experiments is freely available at: https://github.com/aliakbars/dgpsi-picu.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf305"},"PeriodicalIF":2.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}