Pub Date : 2026-01-02eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf330
Vincent Messow, Christian Höner Zu Siederdissen, Michael Habeck
Summary: The cell lists algorithm is widely used to compute pairwise particle interactions below a fixed cutoff distance in approximately linear time. Prominent molecular dynamics frameworks implementing cell lists variants assume pre-determined and densely populated simulation boxes suitable for e.g. all-atom simulations with explicit solvents. zelll implements a simple yet efficient variant of the cell lists algorithm that uses sparse storage for the underlying partitioning grid. This allows for applications with dynamic simulation boundaries and sparsely populated simulation space not strictly fitting into the scope of common molecular dynamics frameworks, such as many coarse-grained simulations. For this reason, zelll does not target specific frameworks.
Availability and implementation: zelll is an open-source Rust library available under the MIT license at https://github.com/microscopic-image-analysis/zelll and https://crates.io/crates/zelll. Python bindings are available at https://pypi.org/project/zelll.
{"title":"zelll: a fast, framework-free, and flexible implementation of the cell lists algorithm for the Rust programming language.","authors":"Vincent Messow, Christian Höner Zu Siederdissen, Michael Habeck","doi":"10.1093/bioadv/vbaf330","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf330","url":null,"abstract":"<p><strong>Summary: </strong>The cell lists algorithm is widely used to compute pairwise particle interactions below a fixed cutoff distance in approximately linear time. Prominent molecular dynamics frameworks implementing cell lists variants assume pre-determined and densely populated simulation boxes suitable for e.g. all-atom simulations with explicit solvents. zelll implements a simple yet efficient variant of the cell lists algorithm that uses sparse storage for the underlying partitioning grid. This allows for applications with dynamic simulation boundaries and sparsely populated simulation space not strictly fitting into the scope of common molecular dynamics frameworks, such as many coarse-grained simulations. For this reason, zelll does not target specific frameworks.</p><p><strong>Availability and implementation: </strong>zelll is an open-source Rust library available under the MIT license at https://github.com/microscopic-image-analysis/zelll and https://crates.io/crates/zelll. Python bindings are available at https://pypi.org/project/zelll.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf330"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12910374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf329
Justin Merondun, Qingyi Yu
Motivation: Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.
Results: We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale de novo genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.
Availability and implementation: Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).
{"title":"Puzzler: scalable one-command platinum-quality genome assembly from HiFi and Hi-C.","authors":"Justin Merondun, Qingyi Yu","doi":"10.1093/bioadv/vbaf329","DOIUrl":"10.1093/bioadv/vbaf329","url":null,"abstract":"<p><strong>Motivation: </strong>Chromosome-level assemblies are essential for modern genomics, from comparative genomics and evolutionary studies to precision breeding. While integrated HiFi and Hi-C data now enable accurate chromosome-scale genome assemblies, the bioinformatic process remains complex and involves specialized tools and expertise. With large-scale pan-genomic efforts requiring dozens to hundreds of platinum quality chromosome-scale genomes, there is a need for scalable, portable, and user-friendly pipelines that streamline and standardize high-quality genome assembly workflows.</p><p><strong>Results: </strong>We introduce Puzzler, a containerized, scalable pipeline for chromosome-scale <i>de novo</i> genome assembly using PacBio HiFi and Hi-C data. Designed for portability and minimal user input, Puzzler automates contig assembly, duplicate purging, Hi-C-based scaffolding, and chromosome assignment via synteny, even with highly diverged reference taxa. Optional modules generate input files for manual Hi-C curation or operate reference-free. Quality control is integrated and includes Hi-C contact maps, BUSCO, yak k-mer completeness, and BlobTools contamination screening. A checkpointing system ensures that previously completed tasks are not re-executed, while a simple sample sheet input structure supports scalable batch processing. Puzzler has been validated on genomes ranging from 24 Mbp to 6.5 Gbp, delivering highly contiguous assemblies with <10 min of user input, enabling high-throughput platinum-quality genome assembly.</p><p><strong>Availability and implementation: </strong>Puzzler is released into the public domain under 17 U.S.C. §105. Source code, documentation, and tutorials are available at https://github.com/merondun/puzzler and archived on Zenodo: https://doi.org/10.5281/zenodo.15733730 and https://doi.org/10.5281/zenodo.15693025. Pre-configured runtime environments including dependencies are provided via both a Conda environment (https://anaconda.org/heritabilities/puzzler) and an Apptainer hosted both on Zenodo and Sylabs (https://cloud.sylabs.io/library/merondun/default/puzzler).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf329"},"PeriodicalIF":2.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-27eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf324
Emilios Tassios, Jori de Leuw, Christoforos Nikolaou, Anne Kupczok, Nikolaos Vakirlis
Motivation: Species-specific orphan genes lack homologues outside of a given taxon and frequently underlie unique species traits. Orphans can result from sequence divergence beyond recognition, when homologous proteins diverge to an extent at which sequence similarity search algorithms can no longer identify them as homologues, but they can also evolve de novo from previously noncoding sequences, in which case homologous protein-coding genes truly do not exist.
Results: Here we propose that sequence divergent orphans might be recognizable from their patterns of non-statistically significant similarity hits which are typically discarded. To test this, we simulated diverged orphan protein sequences under varying parameters. Using reversed protein sequences as negative control, we trained machine learning classifiers on features extracted from similarity search output. We found that this approach works, but performance of the models depends on the simulation parameters, with ∼90% accuracy when the underlying simulated divergence was moderate and ∼70% when it is extreme. When applying our classifiers on a set of real orphans we found that ∼30% of them are predicted to be divergent and these are shorter and more disordered than the rest. Our work contributes to the effort of better understanding how genetic novelty arises.
Availability and implementation: The models and data used can be found at https://github.com/emiliostassios/Classification-of-divergent-genes-using-ML.
{"title":"Machine learning can distinguish orphans that have resulted from sequence divergence beyond recognition.","authors":"Emilios Tassios, Jori de Leuw, Christoforos Nikolaou, Anne Kupczok, Nikolaos Vakirlis","doi":"10.1093/bioadv/vbaf324","DOIUrl":"10.1093/bioadv/vbaf324","url":null,"abstract":"<p><strong>Motivation: </strong>Species-specific orphan genes lack homologues outside of a given taxon and frequently underlie unique species traits. Orphans can result from sequence divergence beyond recognition, when homologous proteins diverge to an extent at which sequence similarity search algorithms can no longer identify them as homologues, but they can also evolve de novo from previously noncoding sequences, in which case homologous protein-coding genes truly do not exist.</p><p><strong>Results: </strong>Here we propose that sequence divergent orphans might be recognizable from their patterns of non-statistically significant similarity hits which are typically discarded. To test this, we simulated diverged orphan protein sequences under varying parameters. Using reversed protein sequences as negative control, we trained machine learning classifiers on features extracted from similarity search output. We found that this approach works, but performance of the models depends on the simulation parameters, with ∼90% accuracy when the underlying simulated divergence was moderate and ∼70% when it is extreme. When applying our classifiers on a set of real orphans we found that ∼30% of them are predicted to be divergent and these are shorter and more disordered than the rest. Our work contributes to the effort of better understanding how genetic novelty arises.</p><p><strong>Availability and implementation: </strong>The models and data used can be found at https://github.com/emiliostassios/Classification-of-divergent-genes-using-ML.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf324"},"PeriodicalIF":2.8,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12904771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-27eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf321
Giovanni Maria De Filippis, Pranoy Sahu, Pasqualina Ambrosio, Stefania Picascia, Matteo Lo Monte, Ilenia Agliarulo, Simone Di Paola, Cristiano Russo, Christian Tommasino, Nicola Normanno, Daniela Frezzetti, Seetharaman Parashuraman, Antonio M Rinaldi, Francesco Russo
Motivation: To date, due to the complexity of both the analytical processes and the result interpretation of RNA-seq expression data analyses, researchers often require the support of bioinformaticians expertise. Selecting appropriate statistical tests and performing essential data manipulations, such as normalization and filtering, in a rigorous and reproducible manner remains a significant challenge for many users.
Results: We developed REDAC, a web-based R application that offers an interactive platform designed to simplify and enhance RNA-seq expression data exploration and analysis. REDAC provides a straightforward approach to perform differentially RNA-seq analysis rapidly, easily, and transparently through natural language queries from users. Moreover, it allows to run complete analyses, generate comprehensive visualizations, and obtain biological interpretation of pathway enrichment results via two popular Large Language Models: Gemma and LLaMA guided by a PubMed based Retrieval-Augmented Generation module. Finally, REDAC promotes reproducibility through the automated generation of analysis reports.
Availability and implementation: REDAC is available for local (https://github.com/franruss/REDAC) and online use (https://frusso.shinyapps.io/REDAC). User manual: https://github.com/franruss/REDAC/blob/main/docs/REDAC_user_manual.pdf.
{"title":"REDAC: RNA-seq expression data analysis chatbot.","authors":"Giovanni Maria De Filippis, Pranoy Sahu, Pasqualina Ambrosio, Stefania Picascia, Matteo Lo Monte, Ilenia Agliarulo, Simone Di Paola, Cristiano Russo, Christian Tommasino, Nicola Normanno, Daniela Frezzetti, Seetharaman Parashuraman, Antonio M Rinaldi, Francesco Russo","doi":"10.1093/bioadv/vbaf321","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf321","url":null,"abstract":"<p><strong>Motivation: </strong>To date, due to the complexity of both the analytical processes and the result interpretation of RNA-seq expression data analyses, researchers often require the support of bioinformaticians expertise. Selecting appropriate statistical tests and performing essential data manipulations, such as normalization and filtering, in a rigorous and reproducible manner remains a significant challenge for many users.</p><p><strong>Results: </strong>We developed REDAC, a web-based R application that offers an interactive platform designed to simplify and enhance RNA-seq expression data exploration and analysis. REDAC provides a straightforward approach to perform differentially RNA-seq analysis rapidly, easily, and transparently through natural language queries from users. Moreover, it allows to run complete analyses, generate comprehensive visualizations, and obtain biological interpretation of pathway enrichment results via two popular Large Language Models: <i>Gemma</i> and <i>LLaMA</i> guided by a PubMed based Retrieval-Augmented Generation module. Finally, REDAC promotes reproducibility through the automated generation of analysis reports.</p><p><strong>Availability and implementation: </strong>REDAC is available for local (https://github.com/franruss/REDAC) and online use (https://frusso.shinyapps.io/REDAC). User manual: https://github.com/franruss/REDAC/blob/main/docs/REDAC_user_manual.pdf.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf321"},"PeriodicalIF":2.8,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12927421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147286350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.
Results: Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.
Availability and implementation: Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).
{"title":"Fluoro-forest: a random forest workflow for cell type annotation in high-dimensional immunofluorescence imaging with limited training data.","authors":"Joshua Brand, Wei Zhang, Evie Carchman, Huy Q Dinh","doi":"10.1093/bioadv/vbaf320","DOIUrl":"10.1093/bioadv/vbaf320","url":null,"abstract":"<p><strong>Motivation: </strong>Cyclic immunofluorescence (IF) techniques enable deep phenotyping of cells and help quantify tissue organization at high resolution. Due to its high dimensionality, workflows typically rely on unsupervised clustering, followed by cell type annotation at a cluster level for cell type assignment. Most of these methods use marker expression averages that lack a statistical evaluation of cell type annotations, which can result in misclassification. Here, we propose a strategy through an end-to-end pipeline using a semi-supervised, random forest approach to predict cell type annotations.</p><p><strong>Results: </strong>Our method includes cluster-based sampling for training data, cell type prediction, and downstream visualization for interpretability of cell annotation that ultimately improves classification results. We show that our workflow can annotate cells more accurately compared to representative deep learning and probabilistic methods, with a training set <5% of the total number of cells tested. In addition, our pipeline outputs cell type probabilities and model performance metrics for users to decide if it could boost their existing clustering-based workflow results for complex IF data.</p><p><strong>Availability and implementation: </strong>Fluoro-forest is freely available on GitHub under an MIT license (https://github.com/Josh-Brand/Fluoro-forest).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf320"},"PeriodicalIF":2.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12782655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf323
Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva
Summary: This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed Prompt-to-Pill architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central Orchestrator. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow in silico.
Availability and implementation: The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.
{"title":"Prompt-to-Pill: Multi-Agent Drug Discovery and Clinical Simulation Pipeline.","authors":"Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva","doi":"10.1093/bioadv/vbaf323","DOIUrl":"10.1093/bioadv/vbaf323","url":null,"abstract":"<p><strong>Summary: </strong>This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed <i>Prompt-to-Pill</i> architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central <i>Orchestrator</i>. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow <i>in silico</i>.</p><p><strong>Availability and implementation: </strong>The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf323"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800774/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf322
Tung-Lin Tsai, Chien-Chong Hong, Hsing-Wen Cheng, Chin-An Yang
Motivation: Early detection of severe bloodstream infections is essential for early treatment initiation. However, the suspicion of bacteremia relies on the combined interpretation of routine laboratory tests, such as complete blood count (CBC), differential count (DC), and elevated C-reactive protein (CRP). Furthermore, a definite diagnosis of bacteremia requires a positive blood culture, which takes several days.
Results: We developed the Interpretable Hematology analyzer Impedance data-based Tabular network for early identification of Bacteremia in Emergency Department (IHIT-BED), a blood stream infection prediction system built by machine learning methods using the integrated data of hematology analyzer impedance histogram signals of CBC, blood culture reports, and CRP levels, which were simultaneously tested in the first blood draw of patients visiting the ED. To our knowledge, IHIT-BED is the first predictor based on hematology impedance histogram signals, which performs well not only in predicting a positive blood culture and severe inflammation, but also is sensitive to detect changes in blood cell morphologies correlated with active inflammatory responses to bacterial infections. IHIT-BED provides clinical decision support for prompt initiation of antibiotics treatment.
Availability and implementation: The method can be found in https://github.com/appleRtsan/IHIT-BED.
{"title":"IHIT-BED: an interpretable transformer approach using unbiased hematology analyzer impedance data for early identification of bacteremia in emergency department.","authors":"Tung-Lin Tsai, Chien-Chong Hong, Hsing-Wen Cheng, Chin-An Yang","doi":"10.1093/bioadv/vbaf322","DOIUrl":"10.1093/bioadv/vbaf322","url":null,"abstract":"<p><strong>Motivation: </strong>Early detection of severe bloodstream infections is essential for early treatment initiation. However, the suspicion of bacteremia relies on the combined interpretation of routine laboratory tests, such as complete blood count (CBC), differential count (DC), and elevated C-reactive protein (CRP). Furthermore, a definite diagnosis of bacteremia requires a positive blood culture, which takes several days.</p><p><strong>Results: </strong>We developed the Interpretable Hematology analyzer Impedance data-based Tabular network for early identification of Bacteremia in Emergency Department (IHIT-BED), a blood stream infection prediction system built by machine learning methods using the integrated data of hematology analyzer impedance histogram signals of CBC, blood culture reports, and CRP levels, which were simultaneously tested in the first blood draw of patients visiting the ED. To our knowledge, IHIT-BED is the first predictor based on hematology impedance histogram signals, which performs well not only in predicting a positive blood culture and severe inflammation, but also is sensitive to detect changes in blood cell morphologies correlated with active inflammatory responses to bacterial infections. IHIT-BED provides clinical decision support for prompt initiation of antibiotics treatment.</p><p><strong>Availability and implementation: </strong>The method can be found in https://github.com/appleRtsan/IHIT-BED.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf322"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf319
Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes
Motivation: The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.
Results: With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.
Availability and implementation: The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.
{"title":"Beyond synthetic lethality in large-scale metabolic and regulatory network models via genetic minimal intervention set.","authors":"Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes","doi":"10.1093/bioadv/vbaf319","DOIUrl":"10.1093/bioadv/vbaf319","url":null,"abstract":"<p><strong>Motivation: </strong>The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.</p><p><strong>Results: </strong>With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.</p><p><strong>Availability and implementation: </strong>The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf319"},"PeriodicalIF":2.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784249/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation:Leishmania infantum is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from Azadirachta indica phytochemicals using molecular modeling techniques. Results: Sixty compounds from A. indica were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. Availability and Implementation: This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further in vitro and in vivo evaluation for the development of TR inhibitors against L. infantum.
{"title":"Computational identification of <i>Azadirachta indica</i> compounds targeting trypanothione reductase in <i>Leishmania infantum</i>.","authors":"Onile Olugbenga Samson, Olukunle Samuel, Fadahunsi Adeyinka Ignatius, Onile Tolulope Adelonpe, Momoh Abdul, Kolawole Oladipo, Afolabi Titilope Esther, Raji Omotara, Hassan Nour, Samir Chtita","doi":"10.1093/bioadv/vbaf318","DOIUrl":"10.1093/bioadv/vbaf318","url":null,"abstract":"<p><p><b>Motivation:</b> <i>Leishmania infantum</i> is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from <i>Azadirachta indica</i> phytochemicals using molecular modeling techniques. <b>Results:</b> Sixty compounds from <i>A. indica</i> were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. <b>Availability and Implementation:</b> This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further <i>in vitro</i> and <i>in vivo</i> evaluation for the development of TR inhibitors against <i>L. infantum</i>.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf318"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf301
Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux
Summary: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.
{"title":"Perspectives in computational mass spectrometry: recent developments and key challenges.","authors":"Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux","doi":"10.1093/bioadv/vbaf301","DOIUrl":"10.1093/bioadv/vbaf301","url":null,"abstract":"<p><p><b>Summary</b>: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf301"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}