Pub Date : 2026-02-17eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag060
Yu-Shen Liu, Yan-Ru Ju, Kai-Wei Chang, Chin Lung Lu
Summary: Designing mRNA coding sequences (CDSs) for vaccine development requires co-optimizing secondary structure stability and codon usage, which are typically measured by minimum free energy (MFE) and codon adaptation index (CAI), respectively. To address this challenge, we previously employed dynamic programming and beam search techniques to develop LinearCDSfold, a tool that generates a single CDS encoding a given protein sequence by jointly optimizing MFE and CAI. It produces an exact solution with cubic-time complexity and a high-quality approximation in linear time, both with respect to the CDS length. Since reducing MFE and increasing CAI often conflict during CDS design, it is desirable to automatically generate Pareto-optimal CDSs, for which no alternative simultaneously improves both objectives. To our knowledge, DERNA is the only existing tool with this functionality. In this work, we enhance the capabilities of LinearCDSfold to automatically and efficiently generate a set of Pareto-optimal CDSs. Experiments conducted on nine protein sequences show that LinearCDSfold performs comparably to DERNA in generating Pareto-optimal CDSs while achieving substantially faster runtime.
Availability and implementation: The program of LinearCDSfold can be downloaded from https://github.com/ablab-nthu/LinearCDSfold.
{"title":"LinearCDSfold: a tool for co-optimizing secondary structure stability and codon usage in coding sequence design.","authors":"Yu-Shen Liu, Yan-Ru Ju, Kai-Wei Chang, Chin Lung Lu","doi":"10.1093/bioadv/vbag060","DOIUrl":"https://doi.org/10.1093/bioadv/vbag060","url":null,"abstract":"<p><strong>Summary: </strong>Designing mRNA coding sequences (CDSs) for vaccine development requires co-optimizing secondary structure stability and codon usage, which are typically measured by minimum free energy (MFE) and codon adaptation index (CAI), respectively. To address this challenge, we previously employed dynamic programming and beam search techniques to develop LinearCDSfold, a tool that generates a single CDS encoding a given protein sequence by jointly optimizing MFE and CAI. It produces an exact solution with cubic-time complexity and a high-quality approximation in linear time, both with respect to the CDS length. Since reducing MFE and increasing CAI often conflict during CDS design, it is desirable to automatically generate Pareto-optimal CDSs, for which no alternative simultaneously improves both objectives. To our knowledge, DERNA is the only existing tool with this functionality. In this work, we enhance the capabilities of LinearCDSfold to automatically and efficiently generate a set of Pareto-optimal CDSs. Experiments conducted on nine protein sequences show that LinearCDSfold performs comparably to DERNA in generating Pareto-optimal CDSs while achieving substantially faster runtime.</p><p><strong>Availability and implementation: </strong>The program of LinearCDSfold can be downloaded from https://github.com/ablab-nthu/LinearCDSfold.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag060"},"PeriodicalIF":2.8,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955848/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147357777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary: TAGINE is a feature engineering algorithm that leverages the microbial taxonomic tree to optimize feature sets in microbiome data for predictive modeling. The algorithm starts with features at high taxonomic levels and iteratively splits them into lower-level clades in cases where it improves predictive accuracy, ultimately producing a feature set spanning multiple taxonomic levels. This approach aims to markedly reduce the number of features while preserving biological relevance and interpretability. We compare TAGINE's performance to other standard and taxonomy-based feature engineering methods on several different datasets, and show that TAGINE yields more compact feature sets and is orders of magnitude faster than other methods, while maintaining predictive accuracy.
Availability and implementation: TAGINE is freely available under the MIT license with source code available at https://github.com/borenstein-lab/tagine_fe.
{"title":"TAGINE: fast taxonomy-based feature engineering for microbiome analysis.","authors":"Shiri Baum, Ido Meshulam, Yadid M Algavi, Omri Peleg, Elhanan Borenstein","doi":"10.1093/bioadv/vbag056","DOIUrl":"10.1093/bioadv/vbag056","url":null,"abstract":"<p><strong>Summary: </strong>TAGINE is a feature engineering algorithm that leverages the microbial taxonomic tree to optimize feature sets in microbiome data for predictive modeling. The algorithm starts with features at high taxonomic levels and iteratively splits them into lower-level clades in cases where it improves predictive accuracy, ultimately producing a feature set spanning multiple taxonomic levels. This approach aims to markedly reduce the number of features while preserving biological relevance and interpretability. We compare TAGINE's performance to other standard and taxonomy-based feature engineering methods on several different datasets, and show that TAGINE yields more compact feature sets and is orders of magnitude faster than other methods, while maintaining predictive accuracy.</p><p><strong>Availability and implementation: </strong>TAGINE is freely available under the MIT license with source code available at https://github.com/borenstein-lab/tagine_fe.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag056"},"PeriodicalIF":2.8,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12961271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag050
Jaesub Park, Woochang Hwang, Seokjun Lee, Hyun Chang Lee, Méabh MacMahon, Matthias Zilbauer, Namshik Han
Motivation: Long COVID is a multisystem condition characterized by persistent symptoms such as fatigue, cognitive impairment, and systemic inflammation following COVID-19 infection. However, its mechanisms remain poorly understood. In this study, we applied the quantum walk, a computational approach leveraging quantum interference, to explore large-scale SARS-CoV-2-induced protein networks.
Result: Compared to the conventional random walk with restart method, the quantum walk demonstrated superior capacity to traverse deeper regions of the network, uncovering proteins and pathways implicated in Long COVID. Key findings include mitochondrial dysfunction, thromboinflammatory responses, and neuronal inflammation as central mechanisms. Quantum walk uniquely identified the CDGSH iron-sulfur domain-containing protein family and VDAC1, a mitochondrial calcium transporter, as critical regulators of these processes. VDAC1 emerged as a potential biomarker and therapeutic target, supported by FDA-approved compounds such as cannabidiol. These findings highlight quantum walk as a powerful tool for elucidating complex biological systems and identifying novel therapeutic targets for conditions like Long COVID.
Availability and implementation: The code and input data that were used for this study are available at https://github.com/Namshik-Han-Lab/QuantumWalk-LongCovid.
{"title":"Advancing understanding of long COVID pathophysiology through quantum walk-based network analysis.","authors":"Jaesub Park, Woochang Hwang, Seokjun Lee, Hyun Chang Lee, Méabh MacMahon, Matthias Zilbauer, Namshik Han","doi":"10.1093/bioadv/vbag050","DOIUrl":"https://doi.org/10.1093/bioadv/vbag050","url":null,"abstract":"<p><strong>Motivation: </strong>Long COVID is a multisystem condition characterized by persistent symptoms such as fatigue, cognitive impairment, and systemic inflammation following COVID-19 infection. However, its mechanisms remain poorly understood. In this study, we applied the quantum walk, a computational approach leveraging quantum interference, to explore large-scale SARS-CoV-2-induced protein networks.</p><p><strong>Result: </strong>Compared to the conventional random walk with restart method, the quantum walk demonstrated superior capacity to traverse deeper regions of the network, uncovering proteins and pathways implicated in Long COVID. Key findings include mitochondrial dysfunction, thromboinflammatory responses, and neuronal inflammation as central mechanisms. Quantum walk uniquely identified the CDGSH iron-sulfur domain-containing protein family and VDAC1, a mitochondrial calcium transporter, as critical regulators of these processes. VDAC1 emerged as a potential biomarker and therapeutic target, supported by FDA-approved compounds such as cannabidiol. These findings highlight quantum walk as a powerful tool for elucidating complex biological systems and identifying novel therapeutic targets for conditions like Long COVID.</p><p><strong>Availability and implementation: </strong>The code and input data that were used for this study are available at https://github.com/Namshik-Han-Lab/QuantumWalk-LongCovid.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag050"},"PeriodicalIF":2.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12975004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag052
Tao Ma, Jinfu Nie, Jian Huang, Yong-Biao Zhang, Joanna M Biernacka, Liguo Wang
Motivation: Illumina DNA methylation arrays have evolved rapidly, expanding genomic coverage while introducing backward incompatibilities by removing many CpG sites present in earlier versions. These changes result in systematic missing values when integrating data across array generations and substantially limiting the reuse of legacy datasets.
Results: We developed a two-stage framework for imputing missing DNA methylation values. The procedure first imputes randomly missing values using standard imputation techniques and then addresses systematic missingness using multi-output machine learning models, including support vector regression, nearest-neighbor methods, random forest models, and deep neural networks. When evaluated on real datasets with up to fifty percent induced missingness, the proposed framework consistently outperformed conventional imputation approaches. It also accurately imputes the missing CpG sites between methylation arrays and reduced representation bisulfite sequencing data, enabling robust cross-platform data integration. Analyses of large brain tumor methylation datasets demonstrate that the method restores array-specific methylation patterns while preserving biological complexity. Importantly, imputing missing methylation sites significantly improves the performance of epigenetic age prediction models.
Availability and implementation: This tool is implemented in the Python package "ultra-impute," freely available at https://github.com/liguowang/ultra-impute. A code snippet demonstrating the usage of the ultra-impute package is provided in a Jupyter Notebook (https://github.com/liguowang/ultra-impute/blob/master/doc/Tutorial.ipynb).
{"title":"Multi-output learning for systematic missing value imputation in DNA methylation arrays.","authors":"Tao Ma, Jinfu Nie, Jian Huang, Yong-Biao Zhang, Joanna M Biernacka, Liguo Wang","doi":"10.1093/bioadv/vbag052","DOIUrl":"https://doi.org/10.1093/bioadv/vbag052","url":null,"abstract":"<p><strong>Motivation: </strong>Illumina DNA methylation arrays have evolved rapidly, expanding genomic coverage while introducing backward incompatibilities by removing many CpG sites present in earlier versions. These changes result in systematic missing values when integrating data across array generations and substantially limiting the reuse of legacy datasets.</p><p><strong>Results: </strong>We developed a two-stage framework for imputing missing DNA methylation values. The procedure first imputes randomly missing values using standard imputation techniques and then addresses systematic missingness using multi-output machine learning models, including support vector regression, nearest-neighbor methods, random forest models, and deep neural networks. When evaluated on real datasets with up to fifty percent induced missingness, the proposed framework consistently outperformed conventional imputation approaches. It also accurately imputes the missing CpG sites between methylation arrays and reduced representation bisulfite sequencing data, enabling robust cross-platform data integration. Analyses of large brain tumor methylation datasets demonstrate that the method restores array-specific methylation patterns while preserving biological complexity. Importantly, imputing missing methylation sites significantly improves the performance of epigenetic age prediction models.</p><p><strong>Availability and implementation: </strong>This tool is implemented in the Python package \"ultra-impute,\" freely available at https://github.com/liguowang/ultra-impute. A code snippet demonstrating the usage of the ultra-impute package is provided in a Jupyter Notebook (https://github.com/liguowang/ultra-impute/blob/master/doc/Tutorial.ipynb).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag052"},"PeriodicalIF":2.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147357785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-15eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag053
Javier Kipen, Matthew Beauregard Smith, Thomas Blom, Sophia Bailing Zhou, Edward M Marcotte, Joakim Jaldén
Summary: Fluorosequencing generates millions of single peptide reads, yet a principled route to quantitative protein abundances has been lacking. We present a probabilistic framework that adapts expectation-maximization (EM) to the fluorosequencing measurement process, using posterior peptide probabilities from existing classifiers to estimate relative protein abundances. The algorithm iteratively updates abundances to maximize the likelihood of observed reads. We first evaluate five-protein simulations with realistic labeling and system errors. A simple Python implementation processes one million reads in under ten seconds on a standard workstation and reduces the mean absolute error by over an order of magnitude relative to a uniform-abundance guess, indicating robust performance in small-scale settings. We also assess scalability with full human-proteome simulations (20 642 proteins). Ten million reads are processed in under four hours on an NVIDIA DGX with a single Tesla V100 GPU, confirming tractability at proteome scale. Under current fluorosequencing error rates, the method yields modest accuracy gains, but when error rates are reduced, estimation error drops markedly, indicating that chemistry improvements would translate directly into more accurate quantitative proteomics. Overall, EM-based inference provides a scalable, model-driven bridge from peptide-level classification to protein-level quantification in fluorosequencing. Furthermore, the framework can also serve as a refinement step within other inference methods.
Availability and implementation: The code and data utilized to produce all the results of this paper is at https://github.com/JavierKipen/ProtInfGPU.
{"title":"Protein abundance inference via expectation-maximization in fluorosequencing.","authors":"Javier Kipen, Matthew Beauregard Smith, Thomas Blom, Sophia Bailing Zhou, Edward M Marcotte, Joakim Jaldén","doi":"10.1093/bioadv/vbag053","DOIUrl":"10.1093/bioadv/vbag053","url":null,"abstract":"<p><strong>Summary: </strong>Fluorosequencing generates millions of single peptide reads, yet a principled route to quantitative protein abundances has been lacking. We present a probabilistic framework that adapts expectation-maximization (EM) to the fluorosequencing measurement process, using posterior peptide probabilities from existing classifiers to estimate relative protein abundances. The algorithm iteratively updates abundances to maximize the likelihood of observed reads. We first evaluate five-protein simulations with realistic labeling and system errors. A simple Python implementation processes one million reads in under ten seconds on a standard workstation and reduces the mean absolute error by over an order of magnitude relative to a uniform-abundance guess, indicating robust performance in small-scale settings. We also assess scalability with full human-proteome simulations (20 642 proteins). Ten million reads are processed in under four hours on an NVIDIA DGX with a single Tesla V100 GPU, confirming tractability at proteome scale. Under current fluorosequencing error rates, the method yields modest accuracy gains, but when error rates are reduced, estimation error drops markedly, indicating that chemistry improvements would translate directly into more accurate quantitative proteomics. Overall, EM-based inference provides a scalable, model-driven bridge from peptide-level classification to protein-level quantification in fluorosequencing. Furthermore, the framework can also serve as a refinement step within other inference methods.</p><p><strong>Availability and implementation: </strong>The code and data utilized to produce all the results of this paper is at https://github.com/JavierKipen/ProtInfGPU.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag053"},"PeriodicalIF":2.8,"publicationDate":"2026-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12961269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag044
Ko Ikemoto, Akihiro Fujimoto
Summary: Long-read RNA-seq uncovers complex transcriptome diversity, opening new avenues for isoform-level expression analysis. Nevertheless, the functional diversity of individual isoforms is still poorly understood. We introduce isoespy, an analysis pipeline for integrating isoform structures, differential expression, and functional annotations from long-read RNA-seq data. The workflow integrates third-party open reading frame predictors, juxtaposes isoform expression levels with gene models, and visualizes positional and non-positional user-provided features. We applied isoespy to a transcriptome dataset of hepatocellular carcinoma, identifying differences in isoform usage and predicted protein function. isoespy facilitates the interpretation of transcriptomic complexity through integrated structural and functional visualization.
Availability and implementation: Isoespy is freely available at https://github.com/kolikem/isoespy.
{"title":"isoespy: an integrated long-read transcriptome workflow for isoform resolution and visualization.","authors":"Ko Ikemoto, Akihiro Fujimoto","doi":"10.1093/bioadv/vbag044","DOIUrl":"10.1093/bioadv/vbag044","url":null,"abstract":"<p><strong>Summary: </strong>Long-read RNA-seq uncovers complex transcriptome diversity, opening new avenues for isoform-level expression analysis. Nevertheless, the functional diversity of individual isoforms is still poorly understood. We introduce isoespy, an analysis pipeline for integrating isoform structures, differential expression, and functional annotations from long-read RNA-seq data. The workflow integrates third-party open reading frame predictors, juxtaposes isoform expression levels with gene models, and visualizes positional and non-positional user-provided features. We applied isoespy to a transcriptome dataset of hepatocellular carcinoma, identifying differences in isoform usage and predicted protein function. isoespy facilitates the interpretation of transcriptomic complexity through integrated structural and functional visualization.</p><p><strong>Availability and implementation: </strong>Isoespy is freely available at https://github.com/kolikem/isoespy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag044"},"PeriodicalIF":2.8,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12935161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147313028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag046
Gerald Mboowa, Ivan Sserwadda, Stephen Kanyerezi
Motivation: Antimicrobial resistance surveillance in ESKAPEE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, Enterobacter spp., and Escherichia coli) requires reproducible, portable whole-genome analysis that public health laboratories including those operating under data-sovereignty constraints can run on laptops, institutional servers, or cloud backends without local dependency conflicts. rMAP 2.0 addresses these needs using a containerized Workflow Description Language pipeline executed with Cromwell.
Results: rMAP 2.0 standardizes end-to-end bacterial whole-genome analysis-read quality control, trimming, assembly and annotation, resistance/virulence/mobile-element profiling, sequence typing, pangenome inference, and phylogenetic reconstruction using containerized execution, and generates a single interactive HTML report that collates outputs for rapid review. The workflow supports fully offline execution (including BLAST searches) for data-sovereign deployments and can run on local workstations, institutional servers, and cloud backends where Docker is supported, providing a consistent execution environment without local tool installation. In a representative benchmark of 20 Enterobacterales isolates, rMAP 2.0 completed a cohort run in ∼4.5 hours on an 8-core/16-GB laptop and flagged a public record misannotated in public repository metadata (SRR9703249, reclassified from K. pneumoniae to Enterobacter cloacae sequence type 182), while confirming lineage assignments such as E. coli sequence type 131.
Availability and implementation: https://github.com/gmboowa/rMAP-2.0 and example workflow reports are available at: https://gmboowa.github.io/rMAP-2.0/.
{"title":"rMAP 2.0: a modular, reproducible, and scalable WDL-Cromwell-Docker workflow for genomic analysis of ESKAPEE pathogens.","authors":"Gerald Mboowa, Ivan Sserwadda, Stephen Kanyerezi","doi":"10.1093/bioadv/vbag046","DOIUrl":"https://doi.org/10.1093/bioadv/vbag046","url":null,"abstract":"<p><strong>Motivation: </strong>Antimicrobial resistance surveillance in ESKAPEE pathogens (<i>Enterococcus faecium</i>, <i>Staphylococcus aureus</i>, <i>Klebsiella pneumoniae</i>, <i>Acinetobacter baumannii</i>, <i>Pseudomonas aeruginosa</i>, <i>Enterobacter</i> spp., and <i>Escherichia coli</i>) requires reproducible, portable whole-genome analysis that public health laboratories including those operating under data-sovereignty constraints can run on laptops, institutional servers, or cloud backends without local dependency conflicts. rMAP 2.0 addresses these needs using a containerized Workflow Description Language pipeline executed with Cromwell.</p><p><strong>Results: </strong>rMAP 2.0 standardizes end-to-end bacterial whole-genome analysis-read quality control, trimming, assembly and annotation, resistance/virulence/mobile-element profiling, sequence typing, pangenome inference, and phylogenetic reconstruction using containerized execution, and generates a single interactive HTML report that collates outputs for rapid review. The workflow supports fully offline execution (including BLAST searches) for data-sovereign deployments and can run on local workstations, institutional servers, and cloud backends where Docker is supported, providing a consistent execution environment without local tool installation. In a representative benchmark of 20 Enterobacterales isolates, rMAP 2.0 completed a cohort run in ∼4.5 hours on an 8-core/16-GB laptop and flagged a public record misannotated in public repository metadata (SRR9703249, reclassified from <i>K. pneumoniae</i> to <i>Enterobacter cloacae</i> sequence type 182), while confirming lineage assignments such as <i>E. coli</i> sequence type 131.</p><p><strong>Availability and implementation: </strong>https://github.com/gmboowa/rMAP-2.0 and example workflow reports are available at: https://gmboowa.github.io/rMAP-2.0/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag046"},"PeriodicalIF":2.8,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147357753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag042
Jacopo Ronchi, Maria Foti
Motivation: MicroRNAs (miRNAs) play a central role in controlling gene expression, and their abnormal activity is frequently linked to disease. Despite advancements in transcriptomic technologies, elucidating miRNA-mediated mechanisms remains challenging due to methodological limitations and a lack of standardized frameworks.
Results: To overcome these barriers, we developed MIRit, a comprehensive R package designed for the rigorous analysis of miRNA-mRNA interactions. With flexible support for both matched and unmatched datasets, MIRit leverages cutting-edge target identification strategies and applies suitable statistical approaches for each scenario. In this study, we benchmarked the performance of commonly used statistical tests for integrative miRNA analysis and demonstrated the effectiveness of MIRit across three human disease contexts-dilated cardiomyopathy, clear cell renal cell carcinoma, and Alzheimer's disease-by uncovering functionally relevant miRNA-target disruptions consistent with known disease mechanisms. Through its streamlined pipeline and biologically appropriate methods, MIRit enables more reproducible and accurate insights into the complex landscape of post-transcriptional regulation.
Availability and implementation: The tool is fully open-source and freely accessible via Bioconductor (https://bioconductor.org/packages/release/bioc/html/MIRit.html), making it readily available to the broader scientific community.
{"title":"MIRit: an integrative R framework for the identification of impaired miRNA-mRNA regulatory networks in complex diseases.","authors":"Jacopo Ronchi, Maria Foti","doi":"10.1093/bioadv/vbag042","DOIUrl":"10.1093/bioadv/vbag042","url":null,"abstract":"<p><strong>Motivation: </strong>MicroRNAs (miRNAs) play a central role in controlling gene expression, and their abnormal activity is frequently linked to disease. Despite advancements in transcriptomic technologies, elucidating miRNA-mediated mechanisms remains challenging due to methodological limitations and a lack of standardized frameworks.</p><p><strong>Results: </strong>To overcome these barriers, we developed MIRit, a comprehensive R package designed for the rigorous analysis of miRNA-mRNA interactions. With flexible support for both matched and unmatched datasets, MIRit leverages cutting-edge target identification strategies and applies suitable statistical approaches for each scenario. In this study, we benchmarked the performance of commonly used statistical tests for integrative miRNA analysis and demonstrated the effectiveness of MIRit across three human disease contexts-dilated cardiomyopathy, clear cell renal cell carcinoma, and Alzheimer's disease-by uncovering functionally relevant miRNA-target disruptions consistent with known disease mechanisms. Through its streamlined pipeline and biologically appropriate methods, MIRit enables more reproducible and accurate insights into the complex landscape of post-transcriptional regulation.</p><p><strong>Availability and implementation: </strong>The tool is fully open-source and freely accessible via Bioconductor (https://bioconductor.org/packages/release/bioc/html/MIRit.html), making it readily available to the broader scientific community.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag042"},"PeriodicalIF":2.8,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12961272/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag045
Rastko Stojšin, Jinlian Wang, Hongfang Liu
Motivation: O-GlcNAcylation, a dynamic post-translational modification regulated by O-GlcNAc transferase (OGT) and O-GlcNAcase (OGA), influences critical biological processes and is dysregulated in cancers. Direct measurement of O-GlcNAcylation dysregulation is challenging due to its instability and low-throughput nature, limiting large-scale studies. However, the regulatory simplicity of this system and the availability of transcriptomic data enable inference of dysregulation from OGT and OGA expression.
Results: We introduce a nonparametric kernel density estimation-based approach to quantify O-GlcNAcylation dysregulation using joint OGT and OGA expression. In simulated datasets with varied expression patterns and controlled dysregulation levels, our method consistently outperformed canonical metrics in quantifying dysregulation. In TCGA data from six cancer types, inferred regulation scores were significantly lower in cancer samples (0.25-0.30 vs. 0.49-0.51) and showed strong distributional differences (Kolmogorov-Smirnov P values <5.95e-11; D-statistics >0.31) compared to those from healthy samples. The scores also allow for accurate classification of cancer status (AUROC: 0.71-0.75) and generalized well to external datasets without retraining. This transcriptomics-based framework offers a scalable approach for interpretable quantification of O-GlcNAcylation dysregulation in cancer.
Availability and implementation: The code and datasets used in this study are freely available at https://github.com/wonder-ai/O-GlcNAcylation_Project under an open-source license.
o - glcn酰化是一种由O-GlcNAc转移酶(OGT)和O-GlcNAcase (OGA)调控的动态翻译后修饰,影响关键的生物学过程,并在癌症中失调。直接测量o - glcn酰化失调是具有挑战性的,因为它的不稳定性和低通量的性质,限制了大规模的研究。然而,该系统的调节简单性和转录组学数据的可用性使得从OGT和OGA表达推断失调成为可能。结果:我们引入了一种基于非参数核密度估计的方法,通过OGT和OGA的联合表达来量化o - glcnac酰化失调。在具有不同表达模式和控制失调水平的模拟数据集中,我们的方法在量化失调方面始终优于规范指标。在六种癌症类型的TCGA数据中,癌症样本的推断调节评分显著低于健康样本(0.25-0.30 vs. 0.49-0.51),且与健康样本相比存在强烈的分布差异(Kolmogorov-Smirnov P值0.31)。该评分还允许对癌症状态进行准确分类(AUROC: 0.71-0.75),并且无需再训练即可很好地推广到外部数据集。这种基于转录组学的框架为癌症中o - glcn酰化失调的可解释量化提供了一种可扩展的方法。可用性和实现:本研究中使用的代码和数据集在开源许可下可在https://github.com/wonder-ai/O-GlcNAcylation_Project免费获得。
{"title":"A kernel density estimation-based approach for quantifying O-GlcNAcylation dysregulation in cancer from gene expression data.","authors":"Rastko Stojšin, Jinlian Wang, Hongfang Liu","doi":"10.1093/bioadv/vbag045","DOIUrl":"10.1093/bioadv/vbag045","url":null,"abstract":"<p><strong>Motivation: </strong>O-GlcNAcylation, a dynamic post-translational modification regulated by O-GlcNAc transferase (OGT) and O-GlcNAcase (OGA), influences critical biological processes and is dysregulated in cancers. Direct measurement of O-GlcNAcylation dysregulation is challenging due to its instability and low-throughput nature, limiting large-scale studies. However, the regulatory simplicity of this system and the availability of transcriptomic data enable inference of dysregulation from OGT and OGA expression.</p><p><strong>Results: </strong>We introduce a nonparametric kernel density estimation-based approach to quantify O-GlcNAcylation dysregulation using joint OGT and OGA expression. In simulated datasets with varied expression patterns and controlled dysregulation levels, our method consistently outperformed canonical metrics in quantifying dysregulation. In TCGA data from six cancer types, inferred regulation scores were significantly lower in cancer samples (0.25-0.30 vs. 0.49-0.51) and showed strong distributional differences (Kolmogorov-Smirnov <i>P</i> values <5.95e-11; D-statistics >0.31) compared to those from healthy samples. The scores also allow for accurate classification of cancer status (AUROC: 0.71-0.75) and generalized well to external datasets without retraining. This transcriptomics-based framework offers a scalable approach for interpretable quantification of O-GlcNAcylation dysregulation in cancer.</p><p><strong>Availability and implementation: </strong>The code and datasets used in this study are freely available at https://github.com/wonder-ai/O-GlcNAcylation_Project under an open-source license.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag045"},"PeriodicalIF":2.8,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12980335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147464075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag041
Li Zhang, Zhenying Ding, Nengjun Yi
Motivation: BCGLMs is a freely available R package that provides functions for setting up and fitting Bayesian compositional models for continuous, binary, ordinal and survival responses. It also includes models with random effects to capture sample-related accumulated small effects, improving prediction accuracy. The package includes tools for summarizing results from fitted models both numerically and graphically. Built on top of the widely used brms package, BCGLMs enable users to incorporate phylogenetic relationships between microbiome taxa into the modeling framework. Overall, BCGLMs package offers a flexible and powerful set of tools for analyzing compositional microbiome data.
Availability and implementation: The package is publicly available via GitHub https://github.com/Li-Zhang28/BCGLMs.
{"title":"BCGLMs: Bayesian modeling for disease prediction using compositional microbiome features.","authors":"Li Zhang, Zhenying Ding, Nengjun Yi","doi":"10.1093/bioadv/vbag041","DOIUrl":"10.1093/bioadv/vbag041","url":null,"abstract":"<p><strong>Motivation: </strong><b>BCGLMs</b> is a freely available R package that provides functions for setting up and fitting Bayesian compositional models for continuous, binary, ordinal and survival responses. It also includes models with random effects to capture sample-related accumulated small effects, improving prediction accuracy. The package includes tools for summarizing results from fitted models both numerically and graphically. Built on top of the widely used <b>brms</b> package, <b>BCGLMs</b> enable users to incorporate phylogenetic relationships between microbiome taxa into the modeling framework. Overall, <b>BCGLMs</b> package offers a flexible and powerful set of tools for analyzing compositional microbiome data.</p><p><strong>Availability and implementation: </strong>The package is publicly available via GitHub https://github.com/Li-Zhang28/BCGLMs.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag041"},"PeriodicalIF":2.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12935159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147313026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}