Pub Date : 2026-03-11DOI: 10.1093/bioinformatics/btag114
Fabiana Rodrigues de Goes, Matheus Fujimura Soares, Vitor Gregorio, Bruno Thiago de Lima Nichio, Alisson Gaspar Chiquitto, Flavia Lombardi Lopes, Mark Basham, Douglas Silva Domingues, Alexandre Rossi Paschoal
Motivation: MirtronDB provides a comprehensive and up-to-date resource for advancing mirtron research within RNA biology. Therefore, maintaining a specialized and continuously updated resource for mirtrons is essential to support ongoing discoveries and to serve as a key reference for researchers investigating the roles of mirtrons.
Results: Here, we present mirtronDB 2.0, an enhanced version that expands both content and functionality. This version integrates mirtron data published between 2017 and 2025, increasing the number of documented mirtrons across various species. In addition, it incorporates newly predicted mirtrons identified through a robust pipeline that combines advanced bioinformatics and machine learning approaches, with specific coverage of six mammalian species. We have introduced new website features, including an interactive dashboard to enhance usability and facilitate intuitive data exploration. These rigorous updates consolidate mirtronDB as a key resource for mirtron to the RNA biology community.
Availability and implementation: mirtronDB can be found under http://mirtrondb.cp.utfpr.edu.br/. The complete content of Database 2.0 and the source code for the analyses are also freely available in the FigShare repository: https://figshare.com/articles/dataset/MirtronDB_version2/29344775.
Contact: Corresponding authors: Alexandre Rossi Paschoal, Rosalind Franklin Institute, Harwell Science and Innovation Campus, Didcot, OX11 0QS, UK; Department of Computer Science, Federal University of Technology-Parana, Cornelio Procopio, Brazil. Email: alexandre.paschoal@rfi.ac.uk or paschoal@utfpr.edu.br; Douglas Silva Domingues, Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil. Email: dougsd@usp.br.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"mirtronDB 2.0: enhanced database with novel mirtron discoveries.","authors":"Fabiana Rodrigues de Goes, Matheus Fujimura Soares, Vitor Gregorio, Bruno Thiago de Lima Nichio, Alisson Gaspar Chiquitto, Flavia Lombardi Lopes, Mark Basham, Douglas Silva Domingues, Alexandre Rossi Paschoal","doi":"10.1093/bioinformatics/btag114","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag114","url":null,"abstract":"<p><strong>Motivation: </strong>MirtronDB provides a comprehensive and up-to-date resource for advancing mirtron research within RNA biology. Therefore, maintaining a specialized and continuously updated resource for mirtrons is essential to support ongoing discoveries and to serve as a key reference for researchers investigating the roles of mirtrons.</p><p><strong>Results: </strong>Here, we present mirtronDB 2.0, an enhanced version that expands both content and functionality. This version integrates mirtron data published between 2017 and 2025, increasing the number of documented mirtrons across various species. In addition, it incorporates newly predicted mirtrons identified through a robust pipeline that combines advanced bioinformatics and machine learning approaches, with specific coverage of six mammalian species. We have introduced new website features, including an interactive dashboard to enhance usability and facilitate intuitive data exploration. These rigorous updates consolidate mirtronDB as a key resource for mirtron to the RNA biology community.</p><p><strong>Availability and implementation: </strong>mirtronDB can be found under http://mirtrondb.cp.utfpr.edu.br/. The complete content of Database 2.0 and the source code for the analyses are also freely available in the FigShare repository: https://figshare.com/articles/dataset/MirtronDB_version2/29344775.</p><p><strong>Contact: </strong>Corresponding authors: Alexandre Rossi Paschoal, Rosalind Franklin Institute, Harwell Science and Innovation Campus, Didcot, OX11 0QS, UK; Department of Computer Science, Federal University of Technology-Parana, Cornelio Procopio, Brazil. Email: alexandre.paschoal@rfi.ac.uk or paschoal@utfpr.edu.br; Douglas Silva Domingues, Department of Genetics, \"Luiz de Queiroz\" College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil. Email: dougsd@usp.br.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-11DOI: 10.1093/bioinformatics/btag116
Brian Tjaden
Motivation: The primary mechanism for transcription termination in bacteria is intrinsic terminators. These terminators influence transcript stability and play key roles in gene regulation. Existing computational methods for genome-wide terminator identification have been designed and evaluated based on a small number of experimentally evinced terminators often from only one or two organisms.
Results: We present TerminatorNet, a system for identifying intrinsic transcription terminators throughout bacteria. TerminatorNet uses a neural network model trained on a large set of experimentally characterized transcription terminators from a variety of bacterial genomes. TerminatorNet identifies 98% of terminators and has a false positive rate of 3%, substantially better than existing approaches. TerminatorNet commonly identifies terminators at the ends of operons. We applied TerminatorNet to thousands of genomes across the taxonomic spectrum of prokaryotes, creating a repository of tens of millions of terminators. We observe heavy use of intrinsic termination in some groups, such as Bacillota, and rare use in other groups such as archaea. We also observe a wealth of instances of DNA uptake signal sequences, important components of transformation specificity for some competent bacteria, in terminators identified in Neisseriaceae and Pasteurellaceae.
Availability: TerminatorNet and its repository of identifications are available for use via a webserver: https://cs.wellesley.edu/∼btjaden/TermNet. The source code is available at GitHub https://github.com/btjaden/TerminatorNet and Zenodo https://doi.org/10.5281/zenodo.18406126.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"TerminatorNet: comprehensive identification of intrinsic transcription terminators in bacteria.","authors":"Brian Tjaden","doi":"10.1093/bioinformatics/btag116","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag116","url":null,"abstract":"<p><strong>Motivation: </strong>The primary mechanism for transcription termination in bacteria is intrinsic terminators. These terminators influence transcript stability and play key roles in gene regulation. Existing computational methods for genome-wide terminator identification have been designed and evaluated based on a small number of experimentally evinced terminators often from only one or two organisms.</p><p><strong>Results: </strong>We present TerminatorNet, a system for identifying intrinsic transcription terminators throughout bacteria. TerminatorNet uses a neural network model trained on a large set of experimentally characterized transcription terminators from a variety of bacterial genomes. TerminatorNet identifies 98% of terminators and has a false positive rate of 3%, substantially better than existing approaches. TerminatorNet commonly identifies terminators at the ends of operons. We applied TerminatorNet to thousands of genomes across the taxonomic spectrum of prokaryotes, creating a repository of tens of millions of terminators. We observe heavy use of intrinsic termination in some groups, such as Bacillota, and rare use in other groups such as archaea. We also observe a wealth of instances of DNA uptake signal sequences, important components of transformation specificity for some competent bacteria, in terminators identified in Neisseriaceae and Pasteurellaceae.</p><p><strong>Availability: </strong>TerminatorNet and its repository of identifications are available for use via a webserver: https://cs.wellesley.edu/∼btjaden/TermNet. The source code is available at GitHub https://github.com/btjaden/TerminatorNet and Zenodo https://doi.org/10.5281/zenodo.18406126.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10DOI: 10.1093/bioinformatics/btag117
Minindu Weerakoon, Hai Vu, Reza Behboudi, Haynes Heaton
Motivation: Accurate demultiplexing of pooled single-cell RNA-seq (scRNA-seq) data is critical for large-scale studies. However, existing methods like vireo, while effective up to ∼16 donors, often struggle with poor clustering due to local optima as donor numbers rise. In high-donor scenarios, overlapping genotypes, a dense genotype space, and increased doublet formation make demultiplexing challenging, requiring methods that are robust to sparse, high-dimensional data and maintain reliable accuracy even as sample complexity grows.
Results: We present an enhanced version of souporcell capable of demultiplexing up to 64 donors. The method uses 10x merge for initialization, K-Harmonic Means for robust clustering, and iterative refinement with reinitialization of low-quality clusters and locking of high-quality ones. Compared to vireo, overclustered vireo, and the original souporcell, our approach completely eliminates incorrectly merged clusters and achieves consistently high Adjusted Rand Index (ARI) scores across various doublet rates, demonstrating improved accuracy and scalability.
Availability: Souporcell3 is freely available under the MIT open-source license at https://github.com/wheaton5/souporcell.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Souporcell3: Robust Demultiplexing for High-Donor Single-Cell RNA-seq Datasets.","authors":"Minindu Weerakoon, Hai Vu, Reza Behboudi, Haynes Heaton","doi":"10.1093/bioinformatics/btag117","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag117","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate demultiplexing of pooled single-cell RNA-seq (scRNA-seq) data is critical for large-scale studies. However, existing methods like vireo, while effective up to ∼16 donors, often struggle with poor clustering due to local optima as donor numbers rise. In high-donor scenarios, overlapping genotypes, a dense genotype space, and increased doublet formation make demultiplexing challenging, requiring methods that are robust to sparse, high-dimensional data and maintain reliable accuracy even as sample complexity grows.</p><p><strong>Results: </strong>We present an enhanced version of souporcell capable of demultiplexing up to 64 donors. The method uses 10x merge for initialization, K-Harmonic Means for robust clustering, and iterative refinement with reinitialization of low-quality clusters and locking of high-quality ones. Compared to vireo, overclustered vireo, and the original souporcell, our approach completely eliminates incorrectly merged clusters and achieves consistently high Adjusted Rand Index (ARI) scores across various doublet rates, demonstrating improved accuracy and scalability.</p><p><strong>Availability: </strong>Souporcell3 is freely available under the MIT open-source license at https://github.com/wheaton5/souporcell.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10DOI: 10.1093/bioinformatics/btag118
Marta Portasany-Rodríguez, Gonzalo Soria-Alcaide, Elena G Sánchez, Mariya Ivanova, Ana Gómez, Reyes Giménez, Jaanam Lalchandani, Gonzalo García-Aguilera, Silvia Alemán-Arteaga, Cristina Saiz-Ladera, Manuel Ramírez-Orellana, Jorge Garcia-Martinez
Summary: PULPO v1.0 is a novel, fully automated pipeline designed for the preprocess and extraction of mutational signatures from raw Optical Genome Mapping (OGM) data. Built using Snakemake and executed within an isolated, Conda-managed environment, PULPO transforms complex cytogenetic alterations, captured at ultra-high resolution, into Catalogue of somatic mutations in cancer mutational signatures (COSMIC). This innovative approach not only enables researchers to work directly from raw OGM inputs but also streamlines the traditionally complex process of signature extraction, making advanced oncogenomic analyses accessible to users with varying levels of bioinformatics expertise. By facilitating the integration of comprehensive structural variants (SVs) and copy number variants (CNVs) data with established signature catalogues, PULPO paves the way for improved diagnostic accuracy and personalized therapeutic strategies.
Availability: The pipeline is open source and freely available under the MIT License at https://github.com/OncologyHNJ/PULPO-v.1.0 and DOI in Zenodo: https://zenodo.org/records/17749097.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"PULPO: Pipeline of understanding large-scale patterns of oncogenomic signatures.","authors":"Marta Portasany-Rodríguez, Gonzalo Soria-Alcaide, Elena G Sánchez, Mariya Ivanova, Ana Gómez, Reyes Giménez, Jaanam Lalchandani, Gonzalo García-Aguilera, Silvia Alemán-Arteaga, Cristina Saiz-Ladera, Manuel Ramírez-Orellana, Jorge Garcia-Martinez","doi":"10.1093/bioinformatics/btag118","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag118","url":null,"abstract":"<p><strong>Summary: </strong>PULPO v1.0 is a novel, fully automated pipeline designed for the preprocess and extraction of mutational signatures from raw Optical Genome Mapping (OGM) data. Built using Snakemake and executed within an isolated, Conda-managed environment, PULPO transforms complex cytogenetic alterations, captured at ultra-high resolution, into Catalogue of somatic mutations in cancer mutational signatures (COSMIC). This innovative approach not only enables researchers to work directly from raw OGM inputs but also streamlines the traditionally complex process of signature extraction, making advanced oncogenomic analyses accessible to users with varying levels of bioinformatics expertise. By facilitating the integration of comprehensive structural variants (SVs) and copy number variants (CNVs) data with established signature catalogues, PULPO paves the way for improved diagnostic accuracy and personalized therapeutic strategies.</p><p><strong>Availability: </strong>The pipeline is open source and freely available under the MIT License at https://github.com/OncologyHNJ/PULPO-v.1.0 and DOI in Zenodo: https://zenodo.org/records/17749097.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10DOI: 10.1093/bioinformatics/btag120
Chenglong Sang, Cheng Peng
Motivation: Characterizing the neuronal connectomes provides route to understand the basis of neural circuit in brains, one of the central missions in neuroscience, but the mapped connectivity is absent of molecular information, obscuring the understanding on the important genes underlying the connectomes. The whole-brain spatial transcriptomics data provide the opportunity to predict and understand the brain connectivity. However, there is no method to process these datasets in consistent data format for integrative analysis.
Results: In this work, we developed a software to process different kinds of mouse brain connectivity data together with spatial transcriptomics in consistent brain regions to define the connectivity path and strength, and then used the long short-term memory network to predict connectivity strengths from the spatial transcriptomics by using our data framework. We evaluated the model in different ways, and the results showed that our model accurately predicted the connectivity strengths and helped in selecting the important genes potentially involved in the regulation, establishment or maintenance of brain connectivity.
Availability: The software is freely available at Github (https://github.com/CPenglab/BrainConnect) and Pypi (https://pypi.org/project/BrainConnect). An archived version is available at https://doi.org/10.5281/zenodo.18440094.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"BrainConnect: processing brain connectivity and spatial transcriptomics data for integrative analysis.","authors":"Chenglong Sang, Cheng Peng","doi":"10.1093/bioinformatics/btag120","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag120","url":null,"abstract":"<p><strong>Motivation: </strong>Characterizing the neuronal connectomes provides route to understand the basis of neural circuit in brains, one of the central missions in neuroscience, but the mapped connectivity is absent of molecular information, obscuring the understanding on the important genes underlying the connectomes. The whole-brain spatial transcriptomics data provide the opportunity to predict and understand the brain connectivity. However, there is no method to process these datasets in consistent data format for integrative analysis.</p><p><strong>Results: </strong>In this work, we developed a software to process different kinds of mouse brain connectivity data together with spatial transcriptomics in consistent brain regions to define the connectivity path and strength, and then used the long short-term memory network to predict connectivity strengths from the spatial transcriptomics by using our data framework. We evaluated the model in different ways, and the results showed that our model accurately predicted the connectivity strengths and helped in selecting the important genes potentially involved in the regulation, establishment or maintenance of brain connectivity.</p><p><strong>Availability: </strong>The software is freely available at Github (https://github.com/CPenglab/BrainConnect) and Pypi (https://pypi.org/project/BrainConnect). An archived version is available at https://doi.org/10.5281/zenodo.18440094.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10DOI: 10.1093/bioinformatics/btag125
James T Robinson, Helga Thorvaldsdottir, Jill P Mesirov
Summary: We present igv-reports, a command-line tool to create standalone HTML pages embedding interactive genomic visualizations of read alignments and associated annotations to support variant inspection workflows. The reports contain all data and code required for visualization of the variant sites, with no dependencies on the input data files.
Availability and implementation: igv-reports is command-line application written in Python. It is freely available at https://github.com/igvteam/igv-reports under an MIT license.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Igv-reports: Embedding interactive genomic visualizations in HTML reports to aid variant review.","authors":"James T Robinson, Helga Thorvaldsdottir, Jill P Mesirov","doi":"10.1093/bioinformatics/btag125","DOIUrl":"10.1093/bioinformatics/btag125","url":null,"abstract":"<p><strong>Summary: </strong>We present igv-reports, a command-line tool to create standalone HTML pages embedding interactive genomic visualizations of read alignments and associated annotations to support variant inspection workflows. The reports contain all data and code required for visualization of the variant sites, with no dependencies on the input data files.</p><p><strong>Availability and implementation: </strong>igv-reports is command-line application written in Python. It is freely available at https://github.com/igvteam/igv-reports under an MIT license.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Protein language models (pLMs) are critical for modeling antibody-antigen interactions, yet sequence-based affinity prediction remains a key challenge, particularly when structural data are scarce. Existing methods often struggle to fully exploit sequence information, limiting their applicability across diverse antibody formats such as single-domain antibodies (sdAbs).
Results: We propose DLP-Affinity, a dual-level deep learning framework for accurate sequence-based affinity prediction. It leverages two complementary modules: Residue-to-Residue (R2R) to capture local interface contacts, and Global Stochastic Projection Embedding (GSPE) to represent global protein properties. Utilizing a fine-tuned protein language model, our approach achieves state-of-the-art performance on the general AB-Bind dataset (reducing mean absolute error by up to 20.9%) and delivers highly competitive results on the sdAb-DB dataset. This provides a robust tool for sequence-based antibody affinity prediction.
Availability and implementation: The source code and datasets for DLP-Affinity are freely available at https://github.com/Zy-Wang-bit/DLP_Affinity and archived on Zenodo at https://doi.org/10.5281/zenodo.18437656.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Predicting Antibody-Antigen Affinity with a Dual-Level Representation Model.","authors":"Ziyang Wang, Yu Zhang, Youli Zhang, Jianwei Huang, Xiaoli Lu, Xiaoping Min, Shengxiang Ge, Jun Zhang, Ningshao Xia","doi":"10.1093/bioinformatics/btag109","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag109","url":null,"abstract":"<p><strong>Motivation: </strong>Protein language models (pLMs) are critical for modeling antibody-antigen interactions, yet sequence-based affinity prediction remains a key challenge, particularly when structural data are scarce. Existing methods often struggle to fully exploit sequence information, limiting their applicability across diverse antibody formats such as single-domain antibodies (sdAbs).</p><p><strong>Results: </strong>We propose DLP-Affinity, a dual-level deep learning framework for accurate sequence-based affinity prediction. It leverages two complementary modules: Residue-to-Residue (R2R) to capture local interface contacts, and Global Stochastic Projection Embedding (GSPE) to represent global protein properties. Utilizing a fine-tuned protein language model, our approach achieves state-of-the-art performance on the general AB-Bind dataset (reducing mean absolute error by up to 20.9%) and delivers highly competitive results on the sdAb-DB dataset. This provides a robust tool for sequence-based antibody affinity prediction.</p><p><strong>Availability and implementation: </strong>The source code and datasets for DLP-Affinity are freely available at https://github.com/Zy-Wang-bit/DLP_Affinity and archived on Zenodo at https://doi.org/10.5281/zenodo.18437656.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1093/bioinformatics/btag115
Yiping Zou, Jiaqi Luo, Shuaicheng Li
Motivation: B-cell receptors (BCRs) and gene expression profiles are two distinct yet complementary modalities of B cells. However, most analyses treat them independently. Here, we present CoMBCR, a B-cell embedding tool that co-learns BCRs and gene expressions, representing data within a unified latent space for downstream analysis.
Results: We applied CoMBCR to 126,791 B cells from diverse datasets with matched BCRs and gene expressions. First, CoMBCR outperforms the methods solely encoding BCRs in capturing B-cell biological features, achieving at least 0.1 improvement in Matthews Correlation Coefficient on a SARS-CoV-2 binding prediction task. Second, CoMBCR reveals active immune responses and CDR3 motif preferences through modality gap analysis in SARS-CoV-2-specific memory B cells. Moreover, when supported by spatial transcriptomics data, CoMBCR accurately traces the developmental trajectories of malignant B cells and uncovers transcriptional patterns associated with their survival within lymphoma patients.
Availability and implementation: The CoMBCR software is publicly available under the MIT License at https://github.com/deepomicslab/CoMBCR.git.
Supplementary information: Supplementary files are available at Bioinformatics online.
{"title":"CoMBCR: Co-Learning Multi-Modalities of BCRs and Gene Expressions.","authors":"Yiping Zou, Jiaqi Luo, Shuaicheng Li","doi":"10.1093/bioinformatics/btag115","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag115","url":null,"abstract":"<p><strong>Motivation: </strong>B-cell receptors (BCRs) and gene expression profiles are two distinct yet complementary modalities of B cells. However, most analyses treat them independently. Here, we present CoMBCR, a B-cell embedding tool that co-learns BCRs and gene expressions, representing data within a unified latent space for downstream analysis.</p><p><strong>Results: </strong>We applied CoMBCR to 126,791 B cells from diverse datasets with matched BCRs and gene expressions. First, CoMBCR outperforms the methods solely encoding BCRs in capturing B-cell biological features, achieving at least 0.1 improvement in Matthews Correlation Coefficient on a SARS-CoV-2 binding prediction task. Second, CoMBCR reveals active immune responses and CDR3 motif preferences through modality gap analysis in SARS-CoV-2-specific memory B cells. Moreover, when supported by spatial transcriptomics data, CoMBCR accurately traces the developmental trajectories of malignant B cells and uncovers transcriptional patterns associated with their survival within lymphoma patients.</p><p><strong>Availability and implementation: </strong>The CoMBCR software is publicly available under the MIT License at https://github.com/deepomicslab/CoMBCR.git.</p><p><strong>Supplementary information: </strong>Supplementary files are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1093/bioinformatics/btag108
Susan E Ott, Giang N Le, Sayed J Mohammadi, Jesse Mittertreiner, Erica M Pasini, Ronald E Bontrop, Natasja G de Groot, Jesse Bruijnesteijn
Motivation: Accurate annotation of germline immunoglobulin (IG) and T cell receptor (TCR) loci is critical for understanding adaptive immunity.
Results: VDJ-Insights provides a user-friendly software package for characterizing these complex immune regions. In addition, it assesses gene segment functionality, identifies recombination signal sequences (RSS), and annotates complementarity-determining regions 1 and 2 (CDR1, CDR2). VDJ-Insights achieved over 99% concordance with curated annotations from multiple species, outperforming existing annotation tools. When applied to 95 haplotypes from the Human Pangenome Reference Consortium, VDJ-Insights identified 652 and 275 novel IG and TCR alleles, respectively, highlighting its scalability for large immunogenetic studies.
Availability and implementation: Datasets and software package are available in the VDJ-insights repository, https://github.com/BPRC-Bioinfo and https://doi.org/10.5281/zenodo.17588835. Additional intermediate datasets used and analysed during the current study are available from the corresponding authors upon reasonable request.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"VDJ-Insights: simplifying the annotation of genomic IG and TCR regions.","authors":"Susan E Ott, Giang N Le, Sayed J Mohammadi, Jesse Mittertreiner, Erica M Pasini, Ronald E Bontrop, Natasja G de Groot, Jesse Bruijnesteijn","doi":"10.1093/bioinformatics/btag108","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag108","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate annotation of germline immunoglobulin (IG) and T cell receptor (TCR) loci is critical for understanding adaptive immunity.</p><p><strong>Results: </strong>VDJ-Insights provides a user-friendly software package for characterizing these complex immune regions. In addition, it assesses gene segment functionality, identifies recombination signal sequences (RSS), and annotates complementarity-determining regions 1 and 2 (CDR1, CDR2). VDJ-Insights achieved over 99% concordance with curated annotations from multiple species, outperforming existing annotation tools. When applied to 95 haplotypes from the Human Pangenome Reference Consortium, VDJ-Insights identified 652 and 275 novel IG and TCR alleles, respectively, highlighting its scalability for large immunogenetic studies.</p><p><strong>Availability and implementation: </strong>Datasets and software package are available in the VDJ-insights repository, https://github.com/BPRC-Bioinfo and https://doi.org/10.5281/zenodo.17588835. Additional intermediate datasets used and analysed during the current study are available from the corresponding authors upon reasonable request.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1093/bioinformatics/btag113
Kyle C Weber, Chenlin Lu, Roberto Vera Alvarez, Bruce D Pascal, Anum Glasgow
Motivation: Hydrogen/deuterium exchange-mass spectrometry (HX-MS) is a rapidly expanding technique used to investigate protein conformational ensembles. The growing popularity and utility of HX-MS has driven the development of diverse instrumentation and software, resulting in inconsistent, non-standardized data analysis and representation. Most HX-MS data formats also employ only mean deuteration representations of the data rather than full isotopic mass spectra, which reduces the information content of the data and limits downstream quantitative analysis.
Results: Inspired by reliable protein structure and genomics data formats, we present HXMS, a unified, lightweight, scalable, and human-readable file format for HX-MS data. The HXMS format preserves the isotopic mass envelopes for all peptides, captures the full experimental time-course including fully deuterated control samples, and contains all other key information. It supports multimodal distributions, post-translational modifications (PTMs), and experimental replicates. To promote compatibility with existing HX-MS workflows, we also developed PFLink, a Python package that converts exported data files from commonly used HX-MS software to the HXMS format. PFLink and the HXMS format will enable quantitative, higher-resolution data processing, improved data sharing and storage among HX-MS practitioners, future machine learning applications, and further developments in HX-MS analysis.
Availability and implementation: PFLink is publicly available to install locally on HuggingFace, alongside documentation, or use online at HuggingFace (https://huggingface.co/spaces/glasgow-lab/PFlink). The supplementary information includes sample input files, sample HXMS files, and a generic unfilled PFlink custom CSV file that users may populate with key experimental conditions and results, which can then be read and converted into the HXMS format.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"HXMS: a standardized file format for HX-MS data.","authors":"Kyle C Weber, Chenlin Lu, Roberto Vera Alvarez, Bruce D Pascal, Anum Glasgow","doi":"10.1093/bioinformatics/btag113","DOIUrl":"10.1093/bioinformatics/btag113","url":null,"abstract":"<p><strong>Motivation: </strong>Hydrogen/deuterium exchange-mass spectrometry (HX-MS) is a rapidly expanding technique used to investigate protein conformational ensembles. The growing popularity and utility of HX-MS has driven the development of diverse instrumentation and software, resulting in inconsistent, non-standardized data analysis and representation. Most HX-MS data formats also employ only mean deuteration representations of the data rather than full isotopic mass spectra, which reduces the information content of the data and limits downstream quantitative analysis.</p><p><strong>Results: </strong>Inspired by reliable protein structure and genomics data formats, we present HXMS, a unified, lightweight, scalable, and human-readable file format for HX-MS data. The HXMS format preserves the isotopic mass envelopes for all peptides, captures the full experimental time-course including fully deuterated control samples, and contains all other key information. It supports multimodal distributions, post-translational modifications (PTMs), and experimental replicates. To promote compatibility with existing HX-MS workflows, we also developed PFLink, a Python package that converts exported data files from commonly used HX-MS software to the HXMS format. PFLink and the HXMS format will enable quantitative, higher-resolution data processing, improved data sharing and storage among HX-MS practitioners, future machine learning applications, and further developments in HX-MS analysis.</p><p><strong>Availability and implementation: </strong>PFLink is publicly available to install locally on HuggingFace, alongside documentation, or use online at HuggingFace (https://huggingface.co/spaces/glasgow-lab/PFlink). The supplementary information includes sample input files, sample HXMS files, and a generic unfilled PFlink custom CSV file that users may populate with key experimental conditions and results, which can then be read and converted into the HXMS format.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}