Pub Date : 2025-12-23eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf323
Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva
Summary: This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed Prompt-to-Pill architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central Orchestrator. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow in silico.
Availability and implementation: The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.
{"title":"Prompt-to-Pill: Multi-Agent Drug Discovery and Clinical Simulation Pipeline.","authors":"Ivana Vichentijevikj, Kostadin Mishev, Monika Simjanoska Misheva","doi":"10.1093/bioadv/vbaf323","DOIUrl":"10.1093/bioadv/vbaf323","url":null,"abstract":"<p><strong>Summary: </strong>This study presents a proof-of-concept, comprehensive, modular framework for AI-driven drug discovery (DD) and clinical trial simulation, spanning from target identification to virtual patient recruitment. Synthesized from a systematic analysis of 51 large language model (LLM)-based systems, the proposed <i>Prompt-to-Pill</i> architecture and corresponding implementation leverages a multi-agent system (MAS) divided into DD, preclinical and clinical phases, coordinated by a central <i>Orchestrator</i>. Each phase comprises specialized LLM for molecular generation, toxicity screening, docking, trial design, and patient matching. To demonstrate the full pipeline in practice, the well-characterized target Dipeptidyl Peptidase 4 (DPP4) was selected as a representative use case. The process begins with generative molecule creation and proceeds through ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) evaluation, structure-based docking, and lead optimization. Clinical-phase agents then simulate trial generation, patient eligibility screening using electronic health records (EHRs), and predict trial outcomes. By tightly integrating generative, predictive, and retrieval-based LLM components, this architecture bridges drug discovery and preclinical phase with virtual clinical development, offering a demonstration of how LLM-based agents can operationalize the drug development workflow <i>in silico</i>.</p><p><strong>Availability and implementation: </strong>The implementation and code are available at: https://github.com/ChatMED/Prompt-to-Pill.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf323"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800774/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf322
Tung-Lin Tsai, Chien-Chong Hong, Hsing-Wen Cheng, Chin-An Yang
Motivation: Early detection of severe bloodstream infections is essential for early treatment initiation. However, the suspicion of bacteremia relies on the combined interpretation of routine laboratory tests, such as complete blood count (CBC), differential count (DC), and elevated C-reactive protein (CRP). Furthermore, a definite diagnosis of bacteremia requires a positive blood culture, which takes several days.
Results: We developed the Interpretable Hematology analyzer Impedance data-based Tabular network for early identification of Bacteremia in Emergency Department (IHIT-BED), a blood stream infection prediction system built by machine learning methods using the integrated data of hematology analyzer impedance histogram signals of CBC, blood culture reports, and CRP levels, which were simultaneously tested in the first blood draw of patients visiting the ED. To our knowledge, IHIT-BED is the first predictor based on hematology impedance histogram signals, which performs well not only in predicting a positive blood culture and severe inflammation, but also is sensitive to detect changes in blood cell morphologies correlated with active inflammatory responses to bacterial infections. IHIT-BED provides clinical decision support for prompt initiation of antibiotics treatment.
Availability and implementation: The method can be found in https://github.com/appleRtsan/IHIT-BED.
{"title":"IHIT-BED: an interpretable transformer approach using unbiased hematology analyzer impedance data for early identification of bacteremia in emergency department.","authors":"Tung-Lin Tsai, Chien-Chong Hong, Hsing-Wen Cheng, Chin-An Yang","doi":"10.1093/bioadv/vbaf322","DOIUrl":"10.1093/bioadv/vbaf322","url":null,"abstract":"<p><strong>Motivation: </strong>Early detection of severe bloodstream infections is essential for early treatment initiation. However, the suspicion of bacteremia relies on the combined interpretation of routine laboratory tests, such as complete blood count (CBC), differential count (DC), and elevated C-reactive protein (CRP). Furthermore, a definite diagnosis of bacteremia requires a positive blood culture, which takes several days.</p><p><strong>Results: </strong>We developed the Interpretable Hematology analyzer Impedance data-based Tabular network for early identification of Bacteremia in Emergency Department (IHIT-BED), a blood stream infection prediction system built by machine learning methods using the integrated data of hematology analyzer impedance histogram signals of CBC, blood culture reports, and CRP levels, which were simultaneously tested in the first blood draw of patients visiting the ED. To our knowledge, IHIT-BED is the first predictor based on hematology impedance histogram signals, which performs well not only in predicting a positive blood culture and severe inflammation, but also is sensitive to detect changes in blood cell morphologies correlated with active inflammatory responses to bacterial infections. IHIT-BED provides clinical decision support for prompt initiation of antibiotics treatment.</p><p><strong>Availability and implementation: </strong>The method can be found in https://github.com/appleRtsan/IHIT-BED.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf322"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf319
Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes
Motivation: The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.
Results: With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.
Availability and implementation: The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.
{"title":"Beyond synthetic lethality in large-scale metabolic and regulatory network models via genetic minimal intervention set.","authors":"Naroa Barrena, Carlos Rodriguez-Flores, Luis V Valcárcel, Danel Olaverri-Mendizabal, Xabier Agirre, Felipe Prósper, Francisco J Planes","doi":"10.1093/bioadv/vbaf319","DOIUrl":"10.1093/bioadv/vbaf319","url":null,"abstract":"<p><strong>Motivation: </strong>The integration of genome-scale metabolic and regulatory networks has received significant interest in cancer systems biology. However, the identification of lethal genetic interventions in these integrated models remains challenging due to the combinatorial explosion of potential solutions. To address this, we developed the genetic Minimal Cut Set (gMCS) framework, which computes synthetic lethal interactions-minimal sets of gene knockouts that are lethal for cellular proliferation- in genome-scale metabolic networks with signed directed acyclic regulatory pathways. Here, we present a novel formulation to calculate genetic Minimal Intervention Sets, gMISs, which incorporate both gene knockouts and knock-ins.</p><p><strong>Results: </strong>With our gMIS approach, we assessed the landscape of lethal genetic interactions in human cells, capturing interventions beyond synthetic lethality, including synthetic dosage lethality and tumor suppressor gene complexes. We applied the concept of synthetic dosage lethality to predict essential genes in cancer and demonstrated a significant increase in sensitivity when compared to large-scale gene knockout screen data. We also analyzed tumor suppressors in cancer cell lines and identified lethal gene knock-in strategies. Finally, we demonstrate how gMISs can help uncover potential therapeutic targets, providing examples in hematological malignancies.</p><p><strong>Availability and implementation: </strong>The gMCSpy Python package now includes gMIS functionalities. Access: https://github.com/PlanesLab/gMCSpy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf319"},"PeriodicalIF":2.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784249/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation:Leishmania infantum is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from Azadirachta indica phytochemicals using molecular modeling techniques. Results: Sixty compounds from A. indica were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. Availability and Implementation: This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further in vitro and in vivo evaluation for the development of TR inhibitors against L. infantum.
{"title":"Computational identification of <i>Azadirachta indica</i> compounds targeting trypanothione reductase in <i>Leishmania infantum</i>.","authors":"Onile Olugbenga Samson, Olukunle Samuel, Fadahunsi Adeyinka Ignatius, Onile Tolulope Adelonpe, Momoh Abdul, Kolawole Oladipo, Afolabi Titilope Esther, Raji Omotara, Hassan Nour, Samir Chtita","doi":"10.1093/bioadv/vbaf318","DOIUrl":"10.1093/bioadv/vbaf318","url":null,"abstract":"<p><p><b>Motivation:</b> <i>Leishmania infantum</i> is the primary cause of VL, and its trypanothione reductase (TR) creates a favorable environment in the host, making TR an attractive drug target. This study aims to identify potential TR inhibitors from <i>Azadirachta indica</i> phytochemicals using molecular modeling techniques. <b>Results:</b> Sixty compounds from <i>A. indica</i> were screened via molecular docking for their binding affinity to TR, followed by binding free energy calculations. Drug-likeness, pharmacokinetics, and toxicity properties of the hit compounds were then evaluated. The top compounds were subjected to a 100 ns molecular dynamics (MDs) simulation to further assess the stability of their interaction with TR. Ten of the screened compounds exhibited higher affinity for TR compared to miltefosine (standard drug), with docking scores ranging from -3.501 to -8.482 kcal/mol, compared to miltefosine's -3.231 kcal/mol. All the drug-like hit compounds showed favorable pharmacokinetics and toxicity profiles and their binding free energies indicated stable interactions. MDs simulations confirmed that these interactions persisted for most of the simulation time, confirming the stability and potential efficacy of the compounds as TR inhibitors. <b>Availability and Implementation:</b> This study identifies isorhamnetin, meliantriol, and quercetin as promising candidates for further <i>in vitro</i> and <i>in vivo</i> evaluation for the development of TR inhibitors against <i>L. infantum</i>.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf318"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf301
Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux
Summary: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.
{"title":"Perspectives in computational mass spectrometry: recent developments and key challenges.","authors":"Timo Sachsenberg, Lindsay K Pino, Marie Brunet, Isabell Bludau, Oliver Kohlbacher, Juan Antonio Vizcaino, Wout Bittremieux","doi":"10.1093/bioadv/vbaf301","DOIUrl":"10.1093/bioadv/vbaf301","url":null,"abstract":"<p><p><b>Summary</b>: Mass spectrometry (MS) is a cornerstone technology in modern molecular biology, powering diverse applications across proteomics, metabolomics, lipidomics, glycomics, and beyond. As the field continues to evolve, rapid advancements in instrumentation, acquisition strategies, machine learning, and scalable computing have reshaped the landscape of computational MS. This perspective reviews recent developments and highlights key challenges, including data harmonization, statistical confidence estimation, repository-scale analysis, multi-omics integration, and privacy in clinical MS. We also discuss the increasing importance of machine learning and the need to build corresponding literacy within the community. Finally, we reflect on the role of the Computational Mass Spectrometry (CompMS) Community of Special Interest of the International Society for Computational Biology in supporting collaboration, innovation, and knowledge exchange. With MS-based technologies now central to both basic and translational research, continued investment in robust and reproducible computational methods will be essential to realize their full potential.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf301"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715313/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Drug-target interaction (DTI) prediction accelerates drug discovery by identifying interactions between chemical compounds and proteins. Existing methods often rely on drug-drug and protein-protein similarity graphs but process them independently, limiting their ability to model interdependencies between modalities. Moving beyond isolated embedding generation from protein and drug graphs, we propose DCGAT-DTI, a novel deep learning framework with a dynamic cross-graph attention (DCGAT) module that dynamically models intra- and cross-graph interactions. Initial embeddings are generated using pretrained language models. Similarity graphs constructed from these embeddings are passed to DCGAT, which uses a Graph Convolutional Network-based Cross-Neighborhood Selection network to dynamically select cross-modal neighbors. This allows drug and protein embeddings to incorporate information from both modalities through intra- and cross-graph attention mechanisms.
Results: Extensive evaluations on four benchmark datasets demonstrate that DCGAT-DTI outperforms state-of-the-art methods across warm and cold start splits for both balanced and unbalanced datasets. In the challenging unbalanced cold start scenarios, it achieves significant improvement in performance for both drugs and proteins over the baselines.
Availability and implementation: Source code is available at https://github.com/compbiolabucf/DCGAT-DTI.
{"title":"DCGAT-DTI: dynamic cross-graph attention network for drug-target interaction prediction.","authors":"Abrar Rahman Abir, Muhtasim Noor Alif, Wencai Zhang, Khandakar Tanvir Ahmed, Wei Zhang","doi":"10.1093/bioadv/vbaf306","DOIUrl":"10.1093/bioadv/vbaf306","url":null,"abstract":"<p><strong>Motivation: </strong>Drug-target interaction (DTI) prediction accelerates drug discovery by identifying interactions between chemical compounds and proteins. Existing methods often rely on drug-drug and protein-protein similarity graphs but process them independently, limiting their ability to model interdependencies between modalities. Moving beyond isolated embedding generation from protein and drug graphs, we propose DCGAT-DTI, a novel deep learning framework with a dynamic cross-graph attention (DCGAT) module that dynamically models intra- and cross-graph interactions. Initial embeddings are generated using pretrained language models. Similarity graphs constructed from these embeddings are passed to DCGAT, which uses a Graph Convolutional Network-based Cross-Neighborhood Selection network to dynamically select cross-modal neighbors. This allows drug and protein embeddings to incorporate information from both modalities through intra- and cross-graph attention mechanisms.</p><p><strong>Results: </strong>Extensive evaluations on four benchmark datasets demonstrate that DCGAT-DTI outperforms state-of-the-art methods across warm and cold start splits for both balanced and unbalanced datasets. In the challenging unbalanced cold start scenarios, it achieves significant improvement in performance for both drugs and proteins over the baselines.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/compbiolabucf/DCGAT-DTI.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf306"},"PeriodicalIF":2.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-14eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbaf316
Nidhi Pai, J Sunil Rao
Motivation: Understanding the role of DNA methylation in oncogenesis, diagnosis, and treatment requires data sufficient in size and accuracy, but current epigenetic data is limited, especially for population groups underrepresented in research. We propose a framework for generating highly accurate DNA methylation predictions using classified mixed model prediction, incorporating a step to cluster patients into cross-cancer and cross-race groups.
Results: Simulations show our framework more accurately predicts underlying mixed effects compared to regression prediction and naive estimates, extending previous work to the case where clusters are estimated from the data. We illustrate this framework using data from The Cancer Genome Atlas, uncovering clustering patterns and generating DNA methylation predictions for further analysis. Our work demonstrates how shared random effects can be leveraged to borrow strength across observations with similar methylation patterns.
Availability and implementation: The methods are implemented in R and available at: https://github.com/nidhipai/dnam_cmmp.
{"title":"Data-based clustering in prediction of cervical cancer DNA methylation using pan-cancer genetic and clinical data.","authors":"Nidhi Pai, J Sunil Rao","doi":"10.1093/bioadv/vbaf316","DOIUrl":"10.1093/bioadv/vbaf316","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the role of DNA methylation in oncogenesis, diagnosis, and treatment requires data sufficient in size and accuracy, but current epigenetic data is limited, especially for population groups underrepresented in research. We propose a framework for generating highly accurate DNA methylation predictions using classified mixed model prediction, incorporating a step to cluster patients into cross-cancer and cross-race groups.</p><p><strong>Results: </strong>Simulations show our framework more accurately predicts underlying mixed effects compared to regression prediction and naive estimates, extending previous work to the case where clusters are estimated from the data. We illustrate this framework using data from The Cancer Genome Atlas, uncovering clustering patterns and generating DNA methylation predictions for further analysis. Our work demonstrates how shared random effects can be leveraged to borrow strength across observations with similar methylation patterns.</p><p><strong>Availability and implementation: </strong>The methods are implemented in R and available at: https://github.com/nidhipai/dnam_cmmp.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf316"},"PeriodicalIF":2.8,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Many topics within the study of the origin and early evolution of life are amenable to computational research strategies. Over a decade ago, the original LUCApedia was developed in order to facilitate such research. Here we describe a massively overhauled LUCApedia database and web server.
Results: The database is composed of 17 different datasets based on previous studies or published hypotheses about the last universal common ancestor and its evolutionary predecessors. Similar to the original LUCApedia database, these datasets are mapped onto a common framework so that they can be corroborated with one another and used to examine continuity across different stages of early evolution.
Availability and implementation: The database can be searched, browsed, and downloaded from the LUCApedia web server, https://lucapedia.org/.
{"title":"A new LUCApedia database for data-driven research on early evolutionary history.","authors":"Zahra Nikfarjam, Ishaan Thota, Alireza Nikfarjam, Freya Kailing, Aaron D Goldman","doi":"10.1093/bioadv/vbaf309","DOIUrl":"10.1093/bioadv/vbaf309","url":null,"abstract":"<p><strong>Motivation: </strong>Many topics within the study of the origin and early evolution of life are amenable to computational research strategies. Over a decade ago, the original LUCApedia was developed in order to facilitate such research. Here we describe a massively overhauled LUCApedia database and web server.</p><p><strong>Results: </strong>The database is composed of 17 different datasets based on previous studies or published hypotheses about the last universal common ancestor and its evolutionary predecessors. Similar to the original LUCApedia database, these datasets are mapped onto a common framework so that they can be corroborated with one another and used to examine continuity across different stages of early evolution.</p><p><strong>Availability and implementation: </strong>The database can be searched, browsed, and downloaded from the LUCApedia web server, https://lucapedia.org/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf309"},"PeriodicalIF":2.8,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12776356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf312
Lapo Doni, Alessia Marotta, Luigi Vezzulli, Emanuele Bosi
Motivation: The revolution of next-generation sequencing has driven the establishment of metabarcoding as an efficient and cost-effective method for exploring community composition. Amplicon sequencing of taxonomic marker genes, such as the 16S rRNA gene in prokaryotes, provides an efficient method for high-throughput taxonomic profiling. The advent of long read technologies made it feasible to sequence the whole 16S rRNA gene rather than only a few regions, with the potential to achieve species-level resolution. Despite the affordability and scalability of such experiments, a major bottleneck remains the lack of integrated and user-friendly analytical workflows. Current pipelines often require the use of multiple tools with complex dependencies, and parameter optimization is frequently performed manually, limiting reproducibility and overall efficiency.
Results: To address these limitations, we developed, AmpWrap, an automated, one line workflow designed to analyse both Illumina and Nanopore amplicons, requiring minimal efforts by the user and automatically optimizing the trimming parameter to retain the maximum number of reads and information while reducing noise.
Availability and implementation: AmpWrap is available at: https://github.com/LDoni/AmpWrap.
{"title":"AmpWrap: a one-line fully automated amplicon metabarcoding 16S and 18S rRNA gene analysis.","authors":"Lapo Doni, Alessia Marotta, Luigi Vezzulli, Emanuele Bosi","doi":"10.1093/bioadv/vbaf312","DOIUrl":"10.1093/bioadv/vbaf312","url":null,"abstract":"<p><strong>Motivation: </strong>The revolution of next-generation sequencing has driven the establishment of metabarcoding as an efficient and cost-effective method for exploring community composition. Amplicon sequencing of taxonomic marker genes, such as the 16S rRNA gene in prokaryotes, provides an efficient method for high-throughput taxonomic profiling. The advent of long read technologies made it feasible to sequence the whole 16S rRNA gene rather than only a few regions, with the potential to achieve species-level resolution. Despite the affordability and scalability of such experiments, a major bottleneck remains the lack of integrated and user-friendly analytical workflows. Current pipelines often require the use of multiple tools with complex dependencies, and parameter optimization is frequently performed manually, limiting reproducibility and overall efficiency.</p><p><strong>Results: </strong>To address these limitations, we developed, AmpWrap, an automated, one line workflow designed to analyse both Illumina and Nanopore amplicons, requiring minimal efforts by the user and automatically optimizing the trimming parameter to retain the maximum number of reads and information while reducing noise.</p><p><strong>Availability and implementation: </strong>AmpWrap is available at: https://github.com/LDoni/AmpWrap.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf312"},"PeriodicalIF":2.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf311
Bernardo Velozo, Clara Carvalho, Rayssa Feitosa, Lucas Aleixo Leal Pedroza, Emerson Danzer, Sandy Ingrid Aguiar Alves, Maira Neves, Bibiana Fam
Motivation: Bioinformatics drives modern biological discovery, and Brazil has become an important contributor to genomics and computational biology. However, bioinformatics education across the country struggles to meet diverse regional and professional demands. To respond to these challenges, the Regional Student Group of Brazil created an Educational Committee in 2019 to expand Portuguese-language resources and evaluate national training needs. Here, we apply the Core Competency 3.0 framework to establish a seven-domain training model spanning foundational biological, statistical, and computational skills, ethical principles, applied bioinformatics practices, communication abilities, and continuous professional development.
Results: A nationwide survey of 375 respondents from more than 21 Brazilian states revealed pronounced geographic and career-based disparities in bioinformatics training. Individuals who primarily use bioinformatics tools, largely students, showed strong interest in phylogenetics and evolutionary analyses, while those focused on software and tool development prioritized computational methods. These findings demonstrate how educational needs differ across profiles and regions, emphasizing the importance of localized strategies to address Brazil's heterogeneous training landscape. Unlike broad competency frameworks, this data-driven approach identifies specific gaps and areas of high demand.
Availability and implementation: By integrating these insights, the Regional Student Group of Brazil proposes an equitable and scalable education model that supports curriculum development and helps strengthen training in regions with limited opportunities, offering a framework adaptable to global scientific communities facing similar socioeconomic challenges.
{"title":"Mapping educational needs in bioinformatics in Brazil: adapting ISCB 3.0 competencies to a regional context.","authors":"Bernardo Velozo, Clara Carvalho, Rayssa Feitosa, Lucas Aleixo Leal Pedroza, Emerson Danzer, Sandy Ingrid Aguiar Alves, Maira Neves, Bibiana Fam","doi":"10.1093/bioadv/vbaf311","DOIUrl":"10.1093/bioadv/vbaf311","url":null,"abstract":"<p><strong>Motivation: </strong>Bioinformatics drives modern biological discovery, and Brazil has become an important contributor to genomics and computational biology. However, bioinformatics education across the country struggles to meet diverse regional and professional demands. To respond to these challenges, the Regional Student Group of Brazil created an Educational Committee in 2019 to expand Portuguese-language resources and evaluate national training needs. Here, we apply the Core Competency 3.0 framework to establish a seven-domain training model spanning foundational biological, statistical, and computational skills, ethical principles, applied bioinformatics practices, communication abilities, and continuous professional development.</p><p><strong>Results: </strong>A nationwide survey of 375 respondents from more than 21 Brazilian states revealed pronounced geographic and career-based disparities in bioinformatics training. Individuals who primarily use bioinformatics tools, largely students, showed strong interest in phylogenetics and evolutionary analyses, while those focused on software and tool development prioritized computational methods. These findings demonstrate how educational needs differ across profiles and regions, emphasizing the importance of localized strategies to address Brazil's heterogeneous training landscape. Unlike broad competency frameworks, this data-driven approach identifies specific gaps and areas of high demand.</p><p><strong>Availability and implementation: </strong>By integrating these insights, the Regional Student Group of Brazil proposes an equitable and scalable education model that supports curriculum development and helps strengthen training in regions with limited opportunities, offering a framework adaptable to global scientific communities facing similar socioeconomic challenges.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf311"},"PeriodicalIF":2.8,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}