Pub Date : 2026-03-01Epub Date: 2026-01-21DOI: 10.1016/j.immuno.2026.100065
Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff
The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.
{"title":"Active learning for improving out-of-distribution lab-in-the-loop experimental design","authors":"Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff","doi":"10.1016/j.immuno.2026.100065","DOIUrl":"10.1016/j.immuno.2026.100065","url":null,"abstract":"<div><div>The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"21 ","pages":"Article 100065"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-03DOI: 10.1016/j.immuno.2026.100066
Meng Wang , Wengyao Jiang , Yuval Kluger , Steven H. Kleinstein , Gisela Gabernet
Large language models have been developed to capture relevant features of adaptive immune receptors, each with unique potential applications. However, the diversity in available models presents challenges in accessibility and usability for downstream applications. Here we present AMULETY (Adaptive imMUne receptor Language model Embedding Tool), a Python-based software package to generate language model embeddings for adaptive immune receptor sequences, enabling users to leverage the strengths of different models without the need for complex configuration. AMULETY offers functions for embedding adaptive immune receptor amino acid sequences using pre-trained protein or antibody language models for paired B-cell receptor heavy-light, T-cell receptor alpha-beta or gamma-delta chains, or single chain sequences. We showcase the variability on the embedding space for several embeddings on a dataset of antibody binders to several SARS-CoV-2 epitopes as well as T-cell receptors binding to several epitopes and showed that different models may be effective at capturing different aspects of the distinctions between epitope groups. AMULETY is freely available under GPLv3 license from https://github.com/immcantation/amulety or via pip from the Python Package Index (PyPI) from https://pypi.org/project/amulety/.
{"title":"AMULETY: A Python package to embed adaptive immune receptor sequences","authors":"Meng Wang , Wengyao Jiang , Yuval Kluger , Steven H. Kleinstein , Gisela Gabernet","doi":"10.1016/j.immuno.2026.100066","DOIUrl":"10.1016/j.immuno.2026.100066","url":null,"abstract":"<div><div>Large language models have been developed to capture relevant features of adaptive immune receptors, each with unique potential applications. However, the diversity in available models presents challenges in accessibility and usability for downstream applications. Here we present AMULETY (Adaptive imMUne receptor Language model Embedding Tool), a Python-based software package to generate language model embeddings for adaptive immune receptor sequences, enabling users to leverage the strengths of different models without the need for complex configuration. AMULETY offers functions for embedding adaptive immune receptor amino acid sequences using pre-trained protein or antibody language models for paired B-cell receptor heavy-light, T-cell receptor alpha-beta or gamma-delta chains, or single chain sequences. We showcase the variability on the embedding space for several embeddings on a dataset of antibody binders to several SARS-CoV-2 epitopes as well as T-cell receptors binding to several epitopes and showed that different models may be effective at capturing different aspects of the distinctions between epitope groups. AMULETY is freely available under GPLv3 license from <span><span>https://github.com/immcantation/amulety</span><svg><path></path></svg></span> or via pip from the Python Package Index (PyPI) from <span><span>https://pypi.org/project/amulety/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"21 ","pages":"Article 100066"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146190588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-21DOI: 10.1016/j.immuno.2025.100063
Kerry A. Mullan , Sebastiaan Valkiers , Nicky de Vrij , Chen Li , Sara Verbandt , Ting Pu , Pieter Meysman
The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ T cells apart from CD4+ T cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ T cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.
{"title":"Where single-cell transcriptomics fails T cells: The misuse of unsupervised clustering for T-cell annotation","authors":"Kerry A. Mullan , Sebastiaan Valkiers , Nicky de Vrij , Chen Li , Sara Verbandt , Ting Pu , Pieter Meysman","doi":"10.1016/j.immuno.2025.100063","DOIUrl":"10.1016/j.immuno.2025.100063","url":null,"abstract":"<div><div>The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ <em>T</em> cells apart from CD4+ <em>T</em> cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ <em>T</em> cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100063"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-16DOI: 10.1016/j.immuno.2025.100060
Anna Niarakis , Gary An , Luiz Ladeira , Noriko F. Hiroi , Athina Papadopoulou , Francis P. Crawley , Niloofar Nikaein , Laurence Calzone , Eirini Tsirvouli , Hasan Balci , Marina Esteban Medina , Lorenzo Veschini , Ozan Ozisik , Francesco Messina , Malvina Marku , Van Du T. Tran , Arnau Montagud , Nikola Schlosserova , Yashwanth Subbannayya , Martina Kutmon , Reinhard Laubenbacher
Digital twins, initially developed for industrial applications, are set to make significant advancements in medicine and healthcare. They have demonstrated promising potential for drug development and personalised care, especially in cardiovascular diagnostics and insulin-dependent diabetes management. A particularly compelling application lies in immune responses and immune-mediated diseases, given the immune system’s essential role in preserving human health, from fighting infections to managing autoimmune diseases. Creating Immune Digital Twins (IDTs) holds great promise for medicine and healthcare. At the same time, the development of a reliable and robust IDT presents significant challenges due to the inherent complexity and polymorphism of the human immune system, the difficulties in measuring patients’ immune state in vivo, and the intrinsic difficulties associated with modelling complex biological systems and processes.
The Working Group “Building Immune Digital Twins” (BIDT WG) aims to address these challenges by fostering transdisciplinary collaborations among immunologists, clinicians, experimentalists, computational biologists, and engineers. The international network is leveraging its cross-disciplinary expertise to build the components required for a working IDT model. Moreover, the BIDT WG focuses on creating an open-access model repository for publicly available immune-related computational models and their required metadata. The group is also active in cataloguing open-access tools, methodologies, and software to identify interoperability gaps in the current modelling landscape.
Consequently, this work can drive transformative innovations in precision medicine, unlocking new possibilities for the diagnosis, treatment, and management of immune-mediated diseases.
{"title":"Building immune digital twins: An international and transdisciplinary community effort","authors":"Anna Niarakis , Gary An , Luiz Ladeira , Noriko F. Hiroi , Athina Papadopoulou , Francis P. Crawley , Niloofar Nikaein , Laurence Calzone , Eirini Tsirvouli , Hasan Balci , Marina Esteban Medina , Lorenzo Veschini , Ozan Ozisik , Francesco Messina , Malvina Marku , Van Du T. Tran , Arnau Montagud , Nikola Schlosserova , Yashwanth Subbannayya , Martina Kutmon , Reinhard Laubenbacher","doi":"10.1016/j.immuno.2025.100060","DOIUrl":"10.1016/j.immuno.2025.100060","url":null,"abstract":"<div><div>Digital twins, initially developed for industrial applications, are set to make significant advancements in medicine and healthcare. They have demonstrated promising potential for drug development and personalised care, especially in cardiovascular diagnostics and insulin-dependent diabetes management. A particularly compelling application lies in immune responses and immune-mediated diseases, given the immune system’s essential role in preserving human health, from fighting infections to managing autoimmune diseases. Creating Immune Digital Twins (IDTs) holds great promise for medicine and healthcare. At the same time, the development of a reliable and robust IDT presents significant challenges due to the inherent complexity and polymorphism of the human immune system, the difficulties in measuring patients’ immune state in vivo, and the intrinsic difficulties associated with modelling complex biological systems and processes.</div><div>The Working Group “Building Immune Digital Twins” (BIDT WG) aims to address these challenges by fostering transdisciplinary collaborations among immunologists, clinicians, experimentalists, computational biologists, and engineers. The international network is leveraging its cross-disciplinary expertise to build the components required for a working IDT model. Moreover, the BIDT WG focuses on creating an open-access model repository for publicly available immune-related computational models and their required metadata. The group is also active in cataloguing open-access tools, methodologies, and software to identify interoperability gaps in the current modelling landscape.</div><div>Consequently, this work can drive transformative innovations in precision medicine, unlocking new possibilities for the diagnosis, treatment, and management of immune-mediated diseases.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100060"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-08-28DOI: 10.1016/j.immuno.2025.100058
James M. Heather , Ayelet Peres , Gur Yaari , William Lees
The rise of T cell receptor (TCR) sequencing technologies is driving both new understandings of the immune system and the development of novel clinical platforms. Such analyses rely on comparing recombined TCR sequences to unrearranged germline reference sequences during V(D)J annotation. In this study we observed that, despite the importance of this step in TCR analysis, most published studies do not properly report the reference used. We use public datasets to illustrate why references should be explicitly specified: using IMGT/GENE-DB as an example, we document how the reference set changes over time. Furthermore we illustrate how prescriptivist interpretations of reference metadata may be obscuring rather than illuminating TCR biology, and demonstrate the need to perform full V gene sequencing in order to unambiguously determine the final translated TCR polypeptide sequence. In summary, we argue that in order to ensure the accuracy and reproducibility of TCR sequencing – an ever more pressing task as more TCR-based diagnostics and therapeutics are developed – we should all take more care with the development, use, and reporting of the TCR germline references used in our science.
{"title":"The gremlin in the works: why T cell receptor researchers need to pay more attention to germline reference sequences","authors":"James M. Heather , Ayelet Peres , Gur Yaari , William Lees","doi":"10.1016/j.immuno.2025.100058","DOIUrl":"10.1016/j.immuno.2025.100058","url":null,"abstract":"<div><div>The rise of T cell receptor (TCR) sequencing technologies is driving both new understandings of the immune system and the development of novel clinical platforms. Such analyses rely on comparing recombined TCR sequences to unrearranged germline reference sequences during V(D)J annotation. In this study we observed that, despite the importance of this step in TCR analysis, most published studies do not properly report the reference used. We use public datasets to illustrate why references should be explicitly specified: using IMGT/GENE-DB as an example, we document how the reference set changes over time. Furthermore we illustrate how prescriptivist interpretations of reference metadata may be obscuring rather than illuminating TCR biology, and demonstrate the need to perform full V gene sequencing in order to unambiguously determine the final translated TCR polypeptide sequence. In summary, we argue that in order to ensure the accuracy and reproducibility of TCR sequencing – an ever more pressing task as more TCR-based diagnostics and therapeutics are developed – we should all take more care with the development, use, and reporting of the TCR germline references used in our science.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100058"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-07DOI: 10.1016/j.immuno.2025.100064
Dominik Grabarczyk , Mikołaj Kocikowski , Maciej Parys , Douglas R. Houston , Ted Hupp , Javier Antonio Alfaro , Shay B. Cohen
Antibody translation across species offers a compelling strategy to extend the vast and expensive investments in human therapeutic antibodies to veterinary oncology, with applications in both veterinary medicine and comparative oncology.
While precise, low-immunogenic treatments are essential for canine cancer care, traditional species conversion methods rely on ad hoc bioinformatics modifications. These methods often implicitly decouple the framework (FR) and complementarity-determining regions (CDRs), ignoring how structural changes in FRs can affect the conformation and function of CDRs. This can compromise binding specificity and require costly high-throughput in vitro screening.
To address this, we present DoggifAI, a transformer model that translates non-canine antibody sequences into canine ones by generating species-appropriate framework regions (FRs) based on desired CDRs. This allows the model to better preserve structural compatibility between FRs and CDRs. The model is pretrained in a T5-style text-to-text denoising task on a large multispecies antibody dataset, which allows further finetuning on a much smaller species-specific dataset.
DoggifAI generates highly canine-like antibodies and shows promising results in preserving binding specificity. To support further progress in this field, we also release a curated dataset of over 430,000 unique canine antibody chain sequences, significantly expanding the public sequence repertoire.
{"title":"DoggifAI: A transformer based approach for antibody caninisation","authors":"Dominik Grabarczyk , Mikołaj Kocikowski , Maciej Parys , Douglas R. Houston , Ted Hupp , Javier Antonio Alfaro , Shay B. Cohen","doi":"10.1016/j.immuno.2025.100064","DOIUrl":"10.1016/j.immuno.2025.100064","url":null,"abstract":"<div><div>Antibody translation across species offers a compelling strategy to extend the vast and expensive investments in human therapeutic antibodies to veterinary oncology, with applications in both veterinary medicine and comparative oncology.</div><div>While precise, low-immunogenic treatments are essential for canine cancer care, traditional species conversion methods rely on ad hoc bioinformatics modifications. These methods often implicitly decouple the framework (FR) and complementarity-determining regions (CDRs), ignoring how structural changes in FRs can affect the conformation and function of CDRs. This can compromise binding specificity and require costly high-throughput <em>in vitro</em> screening.</div><div>To address this, we present DoggifAI, a transformer model that translates non-canine antibody sequences into canine ones by generating species-appropriate framework regions (FRs) based on desired CDRs. This allows the model to better preserve structural compatibility between FRs and CDRs. The model is pretrained in a T5-style text-to-text denoising task on a large multispecies antibody dataset, which allows further finetuning on a much smaller species-specific dataset.</div><div>DoggifAI generates highly canine-like antibodies and shows promising results in preserving binding specificity. To support further progress in this field, we also release a curated dataset of over 430,000 unique canine antibody chain sequences, significantly expanding the public sequence repertoire.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100064"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in sequencing technologies have led to an exponential increase in adaptive immune receptor repertoire (AIRR) data. These receptors, crucial to the adaptive immune system, are believed to have strong potential for diagnostic applications. The immune repertoires represent a wealth of data, creating a growing demand for robust computational methods to analyze and interpret this vast amount of information.
In this review, we examine the application of machine learning algorithms for the classification and analysis of AIRR-seq data for different diagnostic applications. We provide a high-level division of current approaches based on their focus on repertoire-level or sequence-level features. We provide an overview of the current state of public AIRR data sets available for model training. Finally, we briefly highlight what lessons can be learned from successful AIRR diagnostic approaches and what hurdles still must be overcome.
{"title":"Machine learning in AIRR diagnostics: Advances and applications","authors":"Aslı Semerci , Celine AlBalaa , Brian Corrie , Dylan Duchen , Gisela Gabernet , Jinwoo Leem , Enkelejda Miho , Ulrik Stervbo , Justin Barton , Pieter Meysman , AIRR-Community","doi":"10.1016/j.immuno.2025.100062","DOIUrl":"10.1016/j.immuno.2025.100062","url":null,"abstract":"<div><div>Recent advancements in sequencing technologies have led to an exponential increase in adaptive immune receptor repertoire (AIRR) data. These receptors, crucial to the adaptive immune system, are believed to have strong potential for diagnostic applications. The immune repertoires represent a wealth of data, creating a growing demand for robust computational methods to analyze and interpret this vast amount of information.</div><div>In this review, we examine the application of machine learning algorithms for the classification and analysis of AIRR-seq data for different diagnostic applications. We provide a high-level division of current approaches based on their focus on repertoire-level or sequence-level features. We provide an overview of the current state of public AIRR data sets available for model training. Finally, we briefly highlight what lessons can be learned from successful AIRR diagnostic approaches and what hurdles still must be overcome.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100062"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-09DOI: 10.1016/j.immuno.2025.100061
Sebastian Miles, Gonzalo Menafra, Andrés Iriarte, Jose Alejandro Chabalgoity
Accurate prediction of protein antigenicity is crucial for vaccine development, diagnostic test design, and therapeutic protein engineering. However, existing tools face limitations in accessibility, computational efficiency, and pathogen diversity. Here, we present IApred, an open-source intrinsic antigenicity predictor that addresses these challenges. IApred employs a Support Vector Machine (SVM) model trained on a comprehensive dataset of 918 high-antigenicity proteins from diverse pathogens, including Gram-positive and Gram-negative bacteria, viruses, fungi, protozoa, and helminths. The model incorporates features derived from physicochemical properties, E-descriptors, amino acid dimers and small linear motifs (SLiMs) to predict the probability of a protein eliciting a humoral immune response. In external validation, IApred demonstrated superior balanced performance (ROC AUC = 0.761, sensitivity = 0.702, specificity = 0.706) compared to existing tools (VaxiJen 2.0, VaxiJen 3.0 and ANTIGENpro), while maintaining high computational efficiency (approximately 1000 sequences per minute). IApred's host-and-pathogen-agnostic nature and integration capability into bioinformatic pipelines makes it versatile for diverse applications. A web-based version of the software is available at https://smilesinformatics.com/iapred, while the software and training code are freely available on GitHub (https://github.com/sebamiles/IAPred) and Zenodo (https://doi.org/10.5281/zenodo.14578279)
{"title":"IApred: A versatile open-source tool for predicting protein antigenicity across diverse pathogens","authors":"Sebastian Miles, Gonzalo Menafra, Andrés Iriarte, Jose Alejandro Chabalgoity","doi":"10.1016/j.immuno.2025.100061","DOIUrl":"10.1016/j.immuno.2025.100061","url":null,"abstract":"<div><div>Accurate prediction of protein antigenicity is crucial for vaccine development, diagnostic test design, and therapeutic protein engineering. However, existing tools face limitations in accessibility, computational efficiency, and pathogen diversity. Here, we present IApred, an open-source intrinsic antigenicity predictor that addresses these challenges. IApred employs a Support Vector Machine (SVM) model trained on a comprehensive dataset of 918 high-antigenicity proteins from diverse pathogens, including Gram-positive and Gram-negative bacteria, viruses, fungi, protozoa, and helminths. The model incorporates features derived from physicochemical properties, <em>E</em>-descriptors, amino acid dimers and small linear motifs (SLiMs) to predict the probability of a protein eliciting a humoral immune response. In external validation, IApred demonstrated superior balanced performance (ROC AUC = 0.761, sensitivity = 0.702, specificity = 0.706) compared to existing tools (VaxiJen 2.0, VaxiJen 3.0 and ANTIGENpro), while maintaining high computational efficiency (approximately 1000 sequences per minute). IApred's host-and-pathogen-agnostic nature and integration capability into bioinformatic pipelines makes it versatile for diverse applications. A web-based version of the software is available at <span><span>https://smilesinformatics.com/iapred</span><svg><path></path></svg></span>, while the software and training code are freely available on GitHub (<span><span>https://github.com/sebamiles/IAPred</span><svg><path></path></svg></span>) and Zenodo (<span><span>https://doi.org/10.5281/zenodo.14578279</span><svg><path></path></svg></span><strong>)</strong></div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100061"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-07-07DOI: 10.1016/j.immuno.2025.100051
Marc Hoffstedt, Hermann Wätzig, Knut Baumann
Various methods, differing in complexity, have been developed to predict T-cell receptor epitopes. tcrdist3, which implements an easy-to-interpret distance-based approach, has demonstrated performance comparable to the best feature-based methods. Here, a new substitution matrix for tcrdist3 is proposed and its performance is compared to various other substitution matrices. Small performance gains were possible; however tcrdist3 was found to perform reliably well with most substitution matrices. Randomly generated substitution matrices were used as a baseline and resulted in good classification results. It was observed that the prediction quality was negatively correlated with the relative standard deviation of the matrix used (i.e. a larger variance of the weights resulted in poorer predictivity). The most important factor of the tcrdist3-distance between two sequences that could be singled out is the number of substitutions. tcrdist3 implicitly considers the number of substitutions and the type of substitution simultaneously. Using substitution matrices with larger variance penalizes certain substitutions more strongly, which blurs the clusters of sequences with the same number of substitutions. Since the number of substitutions was a key predictor, this resulted in decreased prediction performance.
{"title":"Comparison of different substitution matrices for distance based T-cell receptor epitope predictions using tcrdist3","authors":"Marc Hoffstedt, Hermann Wätzig, Knut Baumann","doi":"10.1016/j.immuno.2025.100051","DOIUrl":"10.1016/j.immuno.2025.100051","url":null,"abstract":"<div><div>Various methods, differing in complexity, have been developed to predict T-cell receptor epitopes. tcrdist3, which implements an easy-to-interpret distance-based approach, has demonstrated performance comparable to the best feature-based methods. Here, a new substitution matrix for tcrdist3 is proposed and its performance is compared to various other substitution matrices. Small performance gains were possible; however tcrdist3 was found to perform reliably well with most substitution matrices. Randomly generated substitution matrices were used as a baseline and resulted in good classification results. It was observed that the prediction quality was negatively correlated with the relative standard deviation of the matrix used (i.e. a larger variance of the weights resulted in poorer predictivity). The most important factor of the tcrdist3-distance between two sequences that could be singled out is the number of substitutions. tcrdist3 implicitly considers the number of substitutions and the type of substitution simultaneously. Using substitution matrices with larger variance penalizes certain substitutions more strongly, which blurs the clusters of sequences with the same number of substitutions. Since the number of substitutions was a key predictor, this resulted in decreased prediction performance.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100051"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-27DOI: 10.1016/j.immuno.2025.100055
Marni E. Cueno, Kenichi Imai
Conformational changes in the SARS-CoV-2 spike protein are critical for understanding viral evolution. In this study, we provide comparative structural and electrostatic analyses across variants, revealing both differentiation and reversion patterns not previously described in locked and activated spike conformations. More specifically, we generated SARS2 spike protein models from the various recorded variants between December, 2019 and November 2021, and performed structural superimposition, dendrogram analyses, and electrostatic mapping. We confirmed which locked and activated conformations differed and reversed between the Original spike protein model and subsequent SARS2 variants and subvariants. Additionally, among the spike protein models of subsequent SARS2 variants and subvariants during December, 2019-November, 2021, we likewise established structural variations and reversions among the locked and activated conformations. Moreover, we established the structural relationship and clustering among the locked and activated conformations of the SARS2 spike protein models. Furthermore, we determined the electrostatic potential of all generated SARS2 spike protein models to establish the surface charge distribution. Taken together, we found that certain locked and activated conformations of the Original SARS2 spike protein models exhibited both structural differences and, surprisingly, reversion when compared to subsequent variants and subvariants. Similarly, structural differentiation and reversion were also observed in the locked and activated conformations across the spike protein models. Additionally, we identified distinct structural clusters within the locked and activated conformations, establishing a structural relationship among certain SARS2 spike protein models. Moreover, we found that during spike evolution reorganization of the surface charge distribution occurs during structural differentiation and reversion.
{"title":"Structural insights on the differentiation and reversion of conformational changes in SARS-CoV-2 spike protein models across variants occurring from December, 2019 to November, 2021","authors":"Marni E. Cueno, Kenichi Imai","doi":"10.1016/j.immuno.2025.100055","DOIUrl":"10.1016/j.immuno.2025.100055","url":null,"abstract":"<div><div>Conformational changes in the SARS-CoV-2 spike protein are critical for understanding viral evolution. In this study, we provide comparative structural and electrostatic analyses across variants, revealing both differentiation and reversion patterns not previously described in locked and activated spike conformations. More specifically, we generated SARS2 spike protein models from the various recorded variants between December, 2019 and November 2021, and performed structural superimposition, dendrogram analyses, and electrostatic mapping. We confirmed which locked and activated conformations differed and reversed between the Original spike protein model and subsequent SARS2 variants and subvariants. Additionally, among the spike protein models of subsequent SARS2 variants and subvariants during December, 2019-November, 2021, we likewise established structural variations and reversions among the locked and activated conformations. Moreover, we established the structural relationship and clustering among the locked and activated conformations of the SARS2 spike protein models. Furthermore, we determined the electrostatic potential of all generated SARS2 spike protein models to establish the surface charge distribution. Taken together, we found that certain locked and activated conformations of the Original SARS2 spike protein models exhibited both structural differences and, surprisingly, reversion when compared to subsequent variants and subvariants. Similarly, structural differentiation and reversion were also observed in the locked and activated conformations across the spike protein models. Additionally, we identified distinct structural clusters within the locked and activated conformations, establishing a structural relationship among certain SARS2 spike protein models. Moreover, we found that during spike evolution reorganization of the surface charge distribution occurs during structural differentiation and reversion.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100055"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144523417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}