Pub Date : 2025-11-09eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf188
Judit Juhász, Noémi Ligeti-Nagy, Babett Bodnár, János Juhász, Sándor Pongor, Balázs Ligeti
Motivation: Phage lifestyle prediction, i.e. classifying phage sequences as virulent or temperate, is crucial in biomedical and ecological applications. Phage sequences from metagenome or virome assemblies are often fragmented, and the diversity of environmental phages is not well known. Current computational approaches often rely on database comparisons that require significant effort and expertise to update. We propose using genomic language models (LMs) for phage lifestyle classification, allowing efficient direct analysis from nucleotide sequences without the need for sophisticated preprocessing pipelines or manually curated databases. We trained three genomic LMs (DNABERT-2, Nucleotide Transformer, and ProkBERT) on datasets of short, fragmented sequences. These models were then compared with dedicated phage lifestyle prediction methods in terms of accuracy, prediction speed, and generalization capability.
Results: ProkBERT PhaStyle achieves accuracy comparable to, and in many cases higher than, state-of-the-art models across various scenarios. It demonstrates the ability to generalize to unseen data in our benchmarks, accurately classifies phages from extreme environments, and also demonstrates high inference speed.
Availability and implementation: Genomic LMs offer a simple and computationally efficient alternative for solving complex classification tasks, such as phage lifestyle prediction. ProkBERT PhaStyle's simplicity, speed, and performance suggest its utility in various ecological and clinical applications.
{"title":"ProkBERT PhaStyle: accurate phage lifestyle prediction with pretrained genomic language models.","authors":"Judit Juhász, Noémi Ligeti-Nagy, Babett Bodnár, János Juhász, Sándor Pongor, Balázs Ligeti","doi":"10.1093/bioadv/vbaf188","DOIUrl":"10.1093/bioadv/vbaf188","url":null,"abstract":"<p><strong>Motivation: </strong>Phage lifestyle prediction, i.e. classifying phage sequences as virulent or temperate, is crucial in biomedical and ecological applications. Phage sequences from metagenome or virome assemblies are often fragmented, and the diversity of environmental phages is not well known. Current computational approaches often rely on database comparisons that require significant effort and expertise to update. We propose using genomic language models (LMs) for phage lifestyle classification, allowing efficient direct analysis from nucleotide sequences without the need for sophisticated preprocessing pipelines or manually curated databases. We trained three genomic LMs (DNABERT-2, Nucleotide Transformer, and ProkBERT) on datasets of short, fragmented sequences. These models were then compared with dedicated phage lifestyle prediction methods in terms of accuracy, prediction speed, and generalization capability.</p><p><strong>Results: </strong>ProkBERT PhaStyle achieves accuracy comparable to, and in many cases higher than, state-of-the-art models across various scenarios. It demonstrates the ability to generalize to unseen data in our benchmarks, accurately classifies phages from extreme environments, and also demonstrates high inference speed.</p><p><strong>Availability and implementation: </strong>Genomic LMs offer a simple and computationally efficient alternative for solving complex classification tasks, such as phage lifestyle prediction. ProkBERT PhaStyle's simplicity, speed, and performance suggest its utility in various ecological and clinical applications.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf188"},"PeriodicalIF":2.8,"publicationDate":"2025-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12603353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-09eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf280
Daan R Speth, Nick Pullen, Samuel T N Aroney, Benjamin L Coltman, Jay Osvatic, Ben J Woodcroft, Thomas Rattei, Michael Wagner
Motivation: Over the past years, substantial numbers of microbial species' genomes have been deposited outside of conventional INSDC databases.
Results: The GlobDB aggregates 14 independent genomic catalogues to provide a comprehensive database of species-dereplicated microbial genomes, with consistent taxonomy, annotations, and additional analysis resources. The GlobDB more than doubles the number of microbial species represented by genomes relative to the field standard genome taxonomy database.
Availability and implementation: The GlobDB is available at https://globdb.org/.
{"title":"GlobDB: a comprehensive species-dereplicated microbial genome resource.","authors":"Daan R Speth, Nick Pullen, Samuel T N Aroney, Benjamin L Coltman, Jay Osvatic, Ben J Woodcroft, Thomas Rattei, Michael Wagner","doi":"10.1093/bioadv/vbaf280","DOIUrl":"10.1093/bioadv/vbaf280","url":null,"abstract":"<p><strong>Motivation: </strong>Over the past years, substantial numbers of microbial species' genomes have been deposited outside of conventional INSDC databases.</p><p><strong>Results: </strong>The GlobDB aggregates 14 independent genomic catalogues to provide a comprehensive database of species-dereplicated microbial genomes, with consistent taxonomy, annotations, and additional analysis resources. The GlobDB more than doubles the number of microbial species represented by genomes relative to the field standard genome taxonomy database.</p><p><strong>Availability and implementation: </strong>The GlobDB is available at https://globdb.org/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf280"},"PeriodicalIF":2.8,"publicationDate":"2025-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629238/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf284
Benjamin Minch, Mohammad Moniruzzaman
Motivation: Viruses in the kingdom Bamfordvirae, specifically giant viruses (NCLDVs) in the phylum Nucleocytoviricota and smaller members in the Preplasmiviricota phylum, are widespread and important groups of viruses that infect eukaryotes. While viruses in this kingdom, such as giant viruses, polinton-like viruses, and virophages, have gained large interest from researchers in recent years, there is still a lack of streamlined tools for the recovery of their genomes from metagenomic datasets.
Results: Here, we present, BEREN, a comprehensive bioinformatic tool to unlock the diversity of these viruses in metagenomes through five modules for NCLDV genome, contig, and marker gene recovery, metabolic protein annotation, and Preplasmiviricota genome identification and annotation. BEREN's performance was benchmarked against other mainstream virus recovery tools using a mock metagenome, demonstrating superior recovery rates of NCLDV contigs and Preplasmiviricota genomes. Overall, BEREN offers a user-friendly, transparent bioinformatic solution for studying the ecological and functional roles of these eukaryotic viruses, facilitating broader access to their metagenomic analysis.
Availability and implementation: BEREN is available at https://gitlab.com/benminch1/BEREN, and results from testing BEREN on a real-world metagenome are available in the Supplementary Files.
{"title":"BEREN: a bioinformatic tool for recovering giant viruses, polinton-like viruses, and virophages in metagenomic data.","authors":"Benjamin Minch, Mohammad Moniruzzaman","doi":"10.1093/bioadv/vbaf284","DOIUrl":"10.1093/bioadv/vbaf284","url":null,"abstract":"<p><strong>Motivation: </strong>Viruses in the kingdom Bamfordvirae, specifically giant viruses (NCLDVs) in the phylum Nucleocytoviricota and smaller members in the Preplasmiviricota phylum, are widespread and important groups of viruses that infect eukaryotes. While viruses in this kingdom, such as giant viruses, polinton-like viruses, and virophages, have gained large interest from researchers in recent years, there is still a lack of streamlined tools for the recovery of their genomes from metagenomic datasets.</p><p><strong>Results: </strong>Here, we present, BEREN, a comprehensive bioinformatic tool to unlock the diversity of these viruses in metagenomes through five modules for NCLDV genome, contig, and marker gene recovery, metabolic protein annotation, and Preplasmiviricota genome identification and annotation. BEREN's performance was benchmarked against other mainstream virus recovery tools using a mock metagenome, demonstrating superior recovery rates of NCLDV contigs and Preplasmiviricota genomes. Overall, BEREN offers a user-friendly, transparent bioinformatic solution for studying the ecological and functional roles of these eukaryotic viruses, facilitating broader access to their metagenomic analysis.</p><p><strong>Availability and implementation: </strong>BEREN is available at https://gitlab.com/benminch1/BEREN, and results from testing BEREN on a real-world metagenome are available in the Supplementary Files.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf284"},"PeriodicalIF":2.8,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12638062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145590033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf251
Loris Nanni, Sheryl Brahnam, Daniel Fusaro
Summary: Selecting an appropriate classifier is essential for achieving accurate classification. In this study, we propose novel neural network (NNs)-based alternatives to standard classifiers as support vector machines. NNs, particularly convolutional neural networks and transformer networks, have shown exceptional performance in processing image data. To leverage this capability, we explore methods for transforming 1D vector data into 2D matrix representations, enabling the application of NNs pre-trained on large-scale image datasets. Specifically, we introduce a new data restructuring technique based on Wigner transforms, and we compare many methods proposed in the literature. The effectiveness and robustness of our approach are assessed using various benchmark datasets, from peptide classification to DNA barcoding classification, demonstrating consistently strong performance.
Availability and implementation: All source code and related resources used in this work are made publicly available at https://github.com/LorisNanni/Matrix-Representation-of-Vectors-in-Neural-Networks-for-Data-Classification.
{"title":"Matrix-based vector representations in neural networks for classifying molecular biology data.","authors":"Loris Nanni, Sheryl Brahnam, Daniel Fusaro","doi":"10.1093/bioadv/vbaf251","DOIUrl":"10.1093/bioadv/vbaf251","url":null,"abstract":"<p><strong>Summary: </strong>Selecting an appropriate classifier is essential for achieving accurate classification. In this study, we propose novel neural network (NNs)-based alternatives to standard classifiers as support vector machines. NNs, particularly convolutional neural networks and transformer networks, have shown exceptional performance in processing image data. To leverage this capability, we explore methods for transforming 1D vector data into 2D matrix representations, enabling the application of NNs pre-trained on large-scale image datasets. Specifically, we introduce a new data restructuring technique based on Wigner transforms, and we compare many methods proposed in the literature. The effectiveness and robustness of our approach are assessed using various benchmark datasets, from peptide classification to DNA barcoding classification, demonstrating consistently strong performance.</p><p><strong>Availability and implementation: </strong>All source code and related resources used in this work are made publicly available at https://github.com/LorisNanni/Matrix-Representation-of-Vectors-in-Neural-Networks-for-Data-Classification.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf251"},"PeriodicalIF":2.8,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-05eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf278
Hongen Kang, Yin-Ying Wang, Peilin Jia
Motivation: Experimentally generated drug-induced transcriptomic signatures are valuable resources to infer candidate drugs for unseen transcriptomes. The Connectivity Map (CMap) includes over 720 000 compound-induced signatures and has been widely used in drug repurposing. However, the computational resources required for an unbiased screen across all these signatures, along with the inconsistent results from different methods, presented huge challenges for the connectivity analyses.
Results: In this study, we developed WebCMap, an R package to search for candidate compounds with similar or reverse activities across all CMap drug-induced signatures. WebCMap implements six widely used methods and a meta-score to evaluate the consistency among these methods. Through a web-accelerated framework, pre-calculated statistics for the permutation test, and multi-core parallelization, WebCMap enables fast screening and retrieval of the results on personal computers within a reasonable time.
Availability and implementation: WebCMap is available at https://github.com/geneprophet/WebCMap.
{"title":"WebCMap: an R package for high-throughput connectivity analysis within the CMap framework.","authors":"Hongen Kang, Yin-Ying Wang, Peilin Jia","doi":"10.1093/bioadv/vbaf278","DOIUrl":"10.1093/bioadv/vbaf278","url":null,"abstract":"<p><strong>Motivation: </strong>Experimentally generated drug-induced transcriptomic signatures are valuable resources to infer candidate drugs for unseen transcriptomes. The Connectivity Map (CMap) includes over 720 000 compound-induced signatures and has been widely used in drug repurposing. However, the computational resources required for an unbiased screen across all these signatures, along with the inconsistent results from different methods, presented huge challenges for the connectivity analyses.</p><p><strong>Results: </strong>In this study, we developed WebCMap, an R package to search for candidate compounds with similar or reverse activities across all CMap drug-induced signatures. WebCMap implements six widely used methods and a meta-score to evaluate the consistency among these methods. Through a web-accelerated framework, pre-calculated statistics for the permutation test, and multi-core parallelization, WebCMap enables fast screening and retrieval of the results on personal computers within a reasonable time.</p><p><strong>Availability and implementation: </strong>WebCMap is available at https://github.com/geneprophet/WebCMap.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf278"},"PeriodicalIF":2.8,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary: To enable flexible, scalable, and reproducible microbiota profiling, we have developed zAMP, an open-source bioinformatics pipeline for the analysis of amplicon sequence data, such as 16S rRNA gene for bacteria and archaea or ITS for fungi. zAMP is complemented by two modules: one to process databases to optimize taxonomy assignment, and the second to benchmark primers, databases and classifier performances. Coupled with zAMPExplorer, an interactive R Shiny application that provides an intuitive interface for quality control, diversity analysis, and statistical testing, this complete toolbox addresses both research and clinical needs in microbiota profiling.
Availability and implementation: Comprehensive documentation and tutorials are provided alongside the source code of zAMP and zAMPExplorer software to facilitate installation and use. zAMP is implemented as a Snakemake workflow, ensuring reproducibility by running within Singularity or Docker containers, and is also easily installable via Bioconda. The zAMPExplorer application, designed for visualization and statistical analysis, can be installed using either a Docker image or from R-universe.
{"title":"zAMP and zAMPExplorer: reproducible scalable amplicon-based metagenomics analysis and visualization.","authors":"Valentin Scherz, Sedreh Nassirnia, Farid Chaabane, Violeta Castelo-Szekely, Gilbert Greub, Trestan Pillonel, Claire Bertelli","doi":"10.1093/bioadv/vbaf255","DOIUrl":"10.1093/bioadv/vbaf255","url":null,"abstract":"<p><strong>Summary: </strong>To enable flexible, scalable, and reproducible microbiota profiling, we have developed zAMP, an open-source bioinformatics pipeline for the analysis of amplicon sequence data, such as 16S rRNA gene for bacteria and archaea or ITS for fungi. zAMP is complemented by two modules: one to process databases to optimize taxonomy assignment, and the second to benchmark primers, databases and classifier performances. Coupled with zAMPExplorer, an interactive R Shiny application that provides an intuitive interface for quality control, diversity analysis, and statistical testing, this complete toolbox addresses both research and clinical needs in microbiota profiling.</p><p><strong>Availability and implementation: </strong>Comprehensive documentation and tutorials are provided alongside the source code of zAMP and zAMPExplorer software to facilitate installation and use. zAMP is implemented as a Snakemake workflow, ensuring reproducibility by running within Singularity or Docker containers, and is also easily installable via Bioconda. The zAMPExplorer application, designed for visualization and statistical analysis, can be installed using either a Docker image or from R-universe.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf255"},"PeriodicalIF":2.8,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12603355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-03eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf273
Sergio Hernández-Galaz, Andrés Hernández-Olivera, Felipe Villanelo, Alvaro Lladser, Alberto J M Martin
Summary: Computational analysis of single-cell RNA sequencing (scRNA-seq) data presents significant barriers for researchers lacking programming expertise, particularly for multi-dataset integration, scalable job management, and reproducible workflows. We developed scExplorer, a web-based platform that addresses these limitations through three key innovations: Comprehensive batch correction using four state-of-the-art algorithms (ComBat, Scanorama, BBKNN, and Harmony), SLURM-based job scheduling with pause/resume functionality for large-scale analyses, and automated generation of publication-ready reports with exportable configuration files ensuring complete reproducibility. The platform's modular Docker architecture supports both standalone and client-server deployments, enabling analysis of datasets ranging from thousands to hundreds of thousands of cells. An openly documented REST API clarifies how the interface orchestrates analyses and supports transparent operation. scExplorer eliminates the technical barriers that prevent non-computational researchers from performing rigorous scRNA-seq analysis while maintaining the transparency and reproducibility standards required for collaborative research.
Availability and implementation: https://apps.cienciavida.org/scexplorer/.
{"title":"scExplorer: a comprehensive web server for single-cell RNA sequencing data analysis.","authors":"Sergio Hernández-Galaz, Andrés Hernández-Olivera, Felipe Villanelo, Alvaro Lladser, Alberto J M Martin","doi":"10.1093/bioadv/vbaf273","DOIUrl":"10.1093/bioadv/vbaf273","url":null,"abstract":"<p><strong>Summary: </strong>Computational analysis of single-cell RNA sequencing (scRNA-seq) data presents significant barriers for researchers lacking programming expertise, particularly for multi-dataset integration, scalable job management, and reproducible workflows. We developed scExplorer, a web-based platform that addresses these limitations through three key innovations: Comprehensive batch correction using four state-of-the-art algorithms (ComBat, Scanorama, BBKNN, and Harmony), SLURM-based job scheduling with pause/resume functionality for large-scale analyses, and automated generation of publication-ready reports with exportable configuration files ensuring complete reproducibility. The platform's modular Docker architecture supports both standalone and client-server deployments, enabling analysis of datasets ranging from thousands to hundreds of thousands of cells. An openly documented REST API clarifies how the interface orchestrates analyses and supports transparent operation. scExplorer eliminates the technical barriers that prevent non-computational researchers from performing rigorous scRNA-seq analysis while maintaining the transparency and reproducibility standards required for collaborative research.</p><p><strong>Availability and implementation: </strong>https://apps.cienciavida.org/scexplorer/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf273"},"PeriodicalIF":2.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-31eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf272
Pietro Cinaglia, Mario Cannataro
Motivation: A comprehensive and in-depth deciphering of the dynamics concerning gene expressions is essential for understanding intricate biological mechanisms; for instance, the latter can be effectively addressed via network science, and Gene Co-expression Networks (GCNs), specifically. However, a typical GCN is based on a static model, which limits the ability to reflect changes that occur over time. To overcome this issue, we designed an open-source user-friendly web-service for constructing temporal networks from genotype-tissue expression data: COnstructing Real-world TEmporal networks (CoRTE).
Results: CoRTE bases the construction of a temporal network on the statistical analysis of the related gene co-expressions across successive age ranges, to define an ordered set of time points. In our experimentation we investigated gene co-expression dynamics across age groups in brain tissues associated with Alzheimer's Disease, processing curated aging-related data via the proposed web-service. The latter has effectively generated the temporal network consisting of a set of gene pairs that showed statistically significant co-expressions over time. Results demonstrated its capacity to capture time-dependent gene interactions relevant for aging-related disease progression. From a purely applicative point of view, CoRTE may be particularly suitable for exploring aging-related changes, disease development, and other time-dependent biological events.
Availability and implementation: CoRTE is freely available at https://github.com/pietrocinaglia/corte-ws.
{"title":"CoRTE: a web-service for constructing temporal networks from genotype-tissue expression data.","authors":"Pietro Cinaglia, Mario Cannataro","doi":"10.1093/bioadv/vbaf272","DOIUrl":"10.1093/bioadv/vbaf272","url":null,"abstract":"<p><strong>Motivation: </strong>A comprehensive and in-depth deciphering of the dynamics concerning gene expressions is essential for understanding intricate biological mechanisms; for instance, the latter can be effectively addressed via network science, and Gene Co-expression Networks (GCNs), specifically. However, a typical GCN is based on a static model, which limits the ability to reflect changes that occur over time. To overcome this issue, we designed an open-source user-friendly web-service for constructing temporal networks from genotype-tissue expression data: <i>COnstructing Real-world TEmporal networks</i> (CoRTE).</p><p><strong>Results: </strong>CoRTE bases the construction of a temporal network on the statistical analysis of the related gene co-expressions across successive age ranges, to define an ordered set of time points. In our experimentation we investigated gene co-expression dynamics across age groups in brain tissues associated with Alzheimer's Disease, processing curated aging-related data via the proposed web-service. The latter has effectively generated the temporal network consisting of a set of gene pairs that showed statistically significant co-expressions over time. Results demonstrated its capacity to capture time-dependent gene interactions relevant for aging-related disease progression. From a purely applicative point of view, CoRTE may be particularly suitable for exploring aging-related changes, disease development, and other time-dependent biological events.</p><p><strong>Availability and implementation: </strong>CoRTE is freely available at https://github.com/pietrocinaglia/corte-ws.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf272"},"PeriodicalIF":2.8,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145590039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-31eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf274
Linfeng Wang, Susana Campino, Taane G Clark, Jody E Phelan
Motivation: Tuberculosis, caused by Mycobacterium tuberculosis, remains a global health challenge driven by rising antibiotic resistance. Antimicrobial peptides offer a promising alternative due to membrane-disruptive activity and low resistance potential, yet the scarcity of TB-specific AMP data constrains targeted development. We present a reproducible deep learning protocol that integrates long short-term memory networks with transfer learning to classify and generate TB-active peptides.
Results: Classifiers were pretrained on a large corpus of general AMPs and fine-tuned on curated TB-specific sequences using frozen encoder and full backpropagation strategies. We benchmarked four model variants [unidirectional and bidirectional long short-term memories (LSTMs), with and without attention] on a held-out TB test set; the unidirectional LSTM with a frozen encoder achieved the best performance (accuracy 90%, AUC 0.97). In parallel, LSTM-based generative models were trained to produce de novo TB-active peptides. A generator trained exclusively on TB data produced 94 of 100 peptides predicted as antimicrobial by AMP Scanner, outperforming transfer learning-based generators. Generated peptides were evaluated for antimicrobial activity, toxicity, structure, and AMP-like physicochemical traits, and four candidates shared ≥84% identity with known TB-AMPs.
Availability and implementation: The complete model and data can be found at: https://github.com/linfeng-wang/TB-AMP-design.
{"title":"Long short-term memory-based deep learning model for the discovery of antimicrobial peptides targeting <i>Mycobacterium tuberculosis</i>.","authors":"Linfeng Wang, Susana Campino, Taane G Clark, Jody E Phelan","doi":"10.1093/bioadv/vbaf274","DOIUrl":"10.1093/bioadv/vbaf274","url":null,"abstract":"<p><strong>Motivation: </strong>Tuberculosis, caused by <i>Mycobacterium tuberculosis</i>, remains a global health challenge driven by rising antibiotic resistance. Antimicrobial peptides offer a promising alternative due to membrane-disruptive activity and low resistance potential, yet the scarcity of TB-specific AMP data constrains targeted development. We present a reproducible deep learning protocol that integrates long short-term memory networks with transfer learning to classify and generate TB-active peptides.</p><p><strong>Results: </strong>Classifiers were pretrained on a large corpus of general AMPs and fine-tuned on curated TB-specific sequences using frozen encoder and full backpropagation strategies. We benchmarked four model variants [unidirectional and bidirectional long short-term memories (LSTMs), with and without attention] on a held-out TB test set; the unidirectional LSTM with a frozen encoder achieved the best performance (accuracy 90%, AUC 0.97). In parallel, LSTM-based generative models were trained to produce de novo TB-active peptides. A generator trained exclusively on TB data produced 94 of 100 peptides predicted as antimicrobial by AMP Scanner, outperforming transfer learning-based generators. Generated peptides were evaluated for antimicrobial activity, toxicity, structure, and AMP-like physicochemical traits, and four candidates shared ≥84% identity with known TB-AMPs.</p><p><strong>Availability and implementation: </strong>The complete model and data can be found at: https://github.com/linfeng-wang/TB-AMP-design.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf274"},"PeriodicalIF":2.8,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12603352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-29eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf263
Raziyeh Masumshah, Changiz Eslahchi
Motivation: Integrating heterogeneous biological data is a central challenge in bioinformatics, especially when modeling complex relationships among entities such as drugs, diseases, and molecular features. Existing methods often rely on static or separate feature extraction processes, which may fail to capture interactions across diverse feature types and reduce predictive accuracy.
Results: To address these limitations, we propose PSO-FeatureFusion, a unified framework that combines particle swarm optimization with neural networks to jointly integrate and optimize features from multiple biological entities. By modeling pairwise feature interactions and learning their optimal contributions, the framework captures individual feature signals and their interdependencies in a task-agnostic and modular manner. We applied PSO-FeatureFusion to two bioinformatics tasks-drug-drug interaction and drug-disease association prediction-using multiple benchmark datasets. Across both tasks, the framework achieved strong performance across evaluation metrics, often outperforming or matching state-of-the-art baselines, including deep learning and graph-based models. The method also demonstrated robustness with limited hyperparameter tuning and flexibility across datasets with varying feature structures. PSO-FeatureFusion provides a scalable and practical solution for researchers working with high-dimensional biological data. Its adaptability and interpretability make it well-suited for applications in drug discovery, disease prediction, and other bioinformatics domains.
Availability and implementation: The source code and datasets are available at https://github.com/raziyehmasumshah/PSO-FeatureFusion.
{"title":"PSO-FeatureFusion: a general framework for fusing heterogeneous features via particle swarm optimization.","authors":"Raziyeh Masumshah, Changiz Eslahchi","doi":"10.1093/bioadv/vbaf263","DOIUrl":"10.1093/bioadv/vbaf263","url":null,"abstract":"<p><strong>Motivation: </strong>Integrating heterogeneous biological data is a central challenge in bioinformatics, especially when modeling complex relationships among entities such as drugs, diseases, and molecular features. Existing methods often rely on static or separate feature extraction processes, which may fail to capture interactions across diverse feature types and reduce predictive accuracy.</p><p><strong>Results: </strong>To address these limitations, we propose PSO-FeatureFusion, a unified framework that combines particle swarm optimization with neural networks to jointly integrate and optimize features from multiple biological entities. By modeling pairwise feature interactions and learning their optimal contributions, the framework captures individual feature signals and their interdependencies in a task-agnostic and modular manner. We applied PSO-FeatureFusion to two bioinformatics tasks-drug-drug interaction and drug-disease association prediction-using multiple benchmark datasets. Across both tasks, the framework achieved strong performance across evaluation metrics, often outperforming or matching state-of-the-art baselines, including deep learning and graph-based models. The method also demonstrated robustness with limited hyperparameter tuning and flexibility across datasets with varying feature structures. PSO-FeatureFusion provides a scalable and practical solution for researchers working with high-dimensional biological data. Its adaptability and interpretability make it well-suited for applications in drug discovery, disease prediction, and other bioinformatics domains.</p><p><strong>Availability and implementation: </strong>The source code and datasets are available at https://github.com/raziyehmasumshah/PSO-FeatureFusion.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf263"},"PeriodicalIF":2.8,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}