Pub Date : 2025-11-07DOI: 10.1016/j.immuno.2025.100064
Dominik Grabarczyk , Mikołaj Kocikowski , Maciej Parys , Douglas R. Houston , Ted Hupp , Javier Antonio Alfaro , Shay B. Cohen
Antibody translation across species offers a compelling strategy to extend the vast and expensive investments in human therapeutic antibodies to veterinary oncology, with applications in both veterinary medicine and comparative oncology.
While precise, low-immunogenic treatments are essential for canine cancer care, traditional species conversion methods rely on ad hoc bioinformatics modifications. These methods often implicitly decouple the framework (FR) and complementarity-determining regions (CDRs), ignoring how structural changes in FRs can affect the conformation and function of CDRs. This can compromise binding specificity and require costly high-throughput in vitro screening.
To address this, we present DoggifAI, a transformer model that translates non-canine antibody sequences into canine ones by generating species-appropriate framework regions (FRs) based on desired CDRs. This allows the model to better preserve structural compatibility between FRs and CDRs. The model is pretrained in a T5-style text-to-text denoising task on a large multispecies antibody dataset, which allows further finetuning on a much smaller species-specific dataset.
DoggifAI generates highly canine-like antibodies and shows promising results in preserving binding specificity. To support further progress in this field, we also release a curated dataset of over 430,000 unique canine antibody chain sequences, significantly expanding the public sequence repertoire.
{"title":"DoggifAI: A transformer based approach for antibody caninisation","authors":"Dominik Grabarczyk , Mikołaj Kocikowski , Maciej Parys , Douglas R. Houston , Ted Hupp , Javier Antonio Alfaro , Shay B. Cohen","doi":"10.1016/j.immuno.2025.100064","DOIUrl":"10.1016/j.immuno.2025.100064","url":null,"abstract":"<div><div>Antibody translation across species offers a compelling strategy to extend the vast and expensive investments in human therapeutic antibodies to veterinary oncology, with applications in both veterinary medicine and comparative oncology.</div><div>While precise, low-immunogenic treatments are essential for canine cancer care, traditional species conversion methods rely on ad hoc bioinformatics modifications. These methods often implicitly decouple the framework (FR) and complementarity-determining regions (CDRs), ignoring how structural changes in FRs can affect the conformation and function of CDRs. This can compromise binding specificity and require costly high-throughput <em>in vitro</em> screening.</div><div>To address this, we present DoggifAI, a transformer model that translates non-canine antibody sequences into canine ones by generating species-appropriate framework regions (FRs) based on desired CDRs. This allows the model to better preserve structural compatibility between FRs and CDRs. The model is pretrained in a T5-style text-to-text denoising task on a large multispecies antibody dataset, which allows further finetuning on a much smaller species-specific dataset.</div><div>DoggifAI generates highly canine-like antibodies and shows promising results in preserving binding specificity. To support further progress in this field, we also release a curated dataset of over 430,000 unique canine antibody chain sequences, significantly expanding the public sequence repertoire.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100064"},"PeriodicalIF":0.0,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21DOI: 10.1016/j.immuno.2025.100063
Kerry A. Mullan , Sebastiaan Valkiers , Nicky de Vrij , Chen Li , Sara Verbandt , Ting Pu , Pieter Meysman
The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ T cells apart from CD4+ T cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ T cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.
{"title":"Where single-cell transcriptomics fails T cells: The misuse of unsupervised clustering for T-cell annotation","authors":"Kerry A. Mullan , Sebastiaan Valkiers , Nicky de Vrij , Chen Li , Sara Verbandt , Ting Pu , Pieter Meysman","doi":"10.1016/j.immuno.2025.100063","DOIUrl":"10.1016/j.immuno.2025.100063","url":null,"abstract":"<div><div>The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ <em>T</em> cells apart from CD4+ <em>T</em> cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ <em>T</em> cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100063"},"PeriodicalIF":0.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in sequencing technologies have led to an exponential increase in adaptive immune receptor repertoire (AIRR) data. These receptors, crucial to the adaptive immune system, are believed to have strong potential for diagnostic applications. The immune repertoires represent a wealth of data, creating a growing demand for robust computational methods to analyze and interpret this vast amount of information.
In this review, we examine the application of machine learning algorithms for the classification and analysis of AIRR-seq data for different diagnostic applications. We provide a high-level division of current approaches based on their focus on repertoire-level or sequence-level features. We provide an overview of the current state of public AIRR data sets available for model training. Finally, we briefly highlight what lessons can be learned from successful AIRR diagnostic approaches and what hurdles still must be overcome.
{"title":"Machine learning in AIRR diagnostics: Advances and applications","authors":"Aslı Semerci , Celine AlBalaa , Brian Corrie , Dylan Duchen , Gisela Gabernet , Jinwoo Leem , Enkelejda Miho , Ulrik Stervbo , Justin Barton , Pieter Meysman , AIRR-Community","doi":"10.1016/j.immuno.2025.100062","DOIUrl":"10.1016/j.immuno.2025.100062","url":null,"abstract":"<div><div>Recent advancements in sequencing technologies have led to an exponential increase in adaptive immune receptor repertoire (AIRR) data. These receptors, crucial to the adaptive immune system, are believed to have strong potential for diagnostic applications. The immune repertoires represent a wealth of data, creating a growing demand for robust computational methods to analyze and interpret this vast amount of information.</div><div>In this review, we examine the application of machine learning algorithms for the classification and analysis of AIRR-seq data for different diagnostic applications. We provide a high-level division of current approaches based on their focus on repertoire-level or sequence-level features. We provide an overview of the current state of public AIRR data sets available for model training. Finally, we briefly highlight what lessons can be learned from successful AIRR diagnostic approaches and what hurdles still must be overcome.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100062"},"PeriodicalIF":0.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09DOI: 10.1016/j.immuno.2025.100061
Sebastian Miles, Gonzalo Menafra, Andrés Iriarte, Jose Alejandro Chabalgoity
Accurate prediction of protein antigenicity is crucial for vaccine development, diagnostic test design, and therapeutic protein engineering. However, existing tools face limitations in accessibility, computational efficiency, and pathogen diversity. Here, we present IApred, an open-source intrinsic antigenicity predictor that addresses these challenges. IApred employs a Support Vector Machine (SVM) model trained on a comprehensive dataset of 918 high-antigenicity proteins from diverse pathogens, including Gram-positive and Gram-negative bacteria, viruses, fungi, protozoa, and helminths. The model incorporates features derived from physicochemical properties, E-descriptors, amino acid dimers and small linear motifs (SLiMs) to predict the probability of a protein eliciting a humoral immune response. In external validation, IApred demonstrated superior balanced performance (ROC AUC = 0.761, sensitivity = 0.702, specificity = 0.706) compared to existing tools (VaxiJen 2.0, VaxiJen 3.0 and ANTIGENpro), while maintaining high computational efficiency (approximately 1000 sequences per minute). IApred's host-and-pathogen-agnostic nature and integration capability into bioinformatic pipelines makes it versatile for diverse applications. A web-based version of the software is available at https://smilesinformatics.com/iapred, while the software and training code are freely available on GitHub (https://github.com/sebamiles/IAPred) and Zenodo (https://doi.org/10.5281/zenodo.14578279)
{"title":"IApred: A versatile open-source tool for predicting protein antigenicity across diverse pathogens","authors":"Sebastian Miles, Gonzalo Menafra, Andrés Iriarte, Jose Alejandro Chabalgoity","doi":"10.1016/j.immuno.2025.100061","DOIUrl":"10.1016/j.immuno.2025.100061","url":null,"abstract":"<div><div>Accurate prediction of protein antigenicity is crucial for vaccine development, diagnostic test design, and therapeutic protein engineering. However, existing tools face limitations in accessibility, computational efficiency, and pathogen diversity. Here, we present IApred, an open-source intrinsic antigenicity predictor that addresses these challenges. IApred employs a Support Vector Machine (SVM) model trained on a comprehensive dataset of 918 high-antigenicity proteins from diverse pathogens, including Gram-positive and Gram-negative bacteria, viruses, fungi, protozoa, and helminths. The model incorporates features derived from physicochemical properties, <em>E</em>-descriptors, amino acid dimers and small linear motifs (SLiMs) to predict the probability of a protein eliciting a humoral immune response. In external validation, IApred demonstrated superior balanced performance (ROC AUC = 0.761, sensitivity = 0.702, specificity = 0.706) compared to existing tools (VaxiJen 2.0, VaxiJen 3.0 and ANTIGENpro), while maintaining high computational efficiency (approximately 1000 sequences per minute). IApred's host-and-pathogen-agnostic nature and integration capability into bioinformatic pipelines makes it versatile for diverse applications. A web-based version of the software is available at <span><span>https://smilesinformatics.com/iapred</span><svg><path></path></svg></span>, while the software and training code are freely available on GitHub (<span><span>https://github.com/sebamiles/IAPred</span><svg><path></path></svg></span>) and Zenodo (<span><span>https://doi.org/10.5281/zenodo.14578279</span><svg><path></path></svg></span><strong>)</strong></div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100061"},"PeriodicalIF":0.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16DOI: 10.1016/j.immuno.2025.100060
Anna Niarakis , Gary An , Luiz Ladeira , Noriko F. Hiroi , Athina Papadopoulou , Francis P. Crawley , Niloofar Nikaein , Laurence Calzone , Eirini Tsirvouli , Hasan Balci , Marina Esteban Medina , Lorenzo Veschini , Ozan Ozisik , Francesco Messina , Malvina Marku , Van Du T. Tran , Arnau Montagud , Nikola Schlosserova , Yashwanth Subbannayya , Martina Kutmon , Reinhard Laubenbacher
Digital twins, initially developed for industrial applications, are set to make significant advancements in medicine and healthcare. They have demonstrated promising potential for drug development and personalised care, especially in cardiovascular diagnostics and insulin-dependent diabetes management. A particularly compelling application lies in immune responses and immune-mediated diseases, given the immune system’s essential role in preserving human health, from fighting infections to managing autoimmune diseases. Creating Immune Digital Twins (IDTs) holds great promise for medicine and healthcare. At the same time, the development of a reliable and robust IDT presents significant challenges due to the inherent complexity and polymorphism of the human immune system, the difficulties in measuring patients’ immune state in vivo, and the intrinsic difficulties associated with modelling complex biological systems and processes.
The Working Group “Building Immune Digital Twins” (BIDT WG) aims to address these challenges by fostering transdisciplinary collaborations among immunologists, clinicians, experimentalists, computational biologists, and engineers. The international network is leveraging its cross-disciplinary expertise to build the components required for a working IDT model. Moreover, the BIDT WG focuses on creating an open-access model repository for publicly available immune-related computational models and their required metadata. The group is also active in cataloguing open-access tools, methodologies, and software to identify interoperability gaps in the current modelling landscape.
Consequently, this work can drive transformative innovations in precision medicine, unlocking new possibilities for the diagnosis, treatment, and management of immune-mediated diseases.
{"title":"Building immune digital twins: An international and transdisciplinary community effort","authors":"Anna Niarakis , Gary An , Luiz Ladeira , Noriko F. Hiroi , Athina Papadopoulou , Francis P. Crawley , Niloofar Nikaein , Laurence Calzone , Eirini Tsirvouli , Hasan Balci , Marina Esteban Medina , Lorenzo Veschini , Ozan Ozisik , Francesco Messina , Malvina Marku , Van Du T. Tran , Arnau Montagud , Nikola Schlosserova , Yashwanth Subbannayya , Martina Kutmon , Reinhard Laubenbacher","doi":"10.1016/j.immuno.2025.100060","DOIUrl":"10.1016/j.immuno.2025.100060","url":null,"abstract":"<div><div>Digital twins, initially developed for industrial applications, are set to make significant advancements in medicine and healthcare. They have demonstrated promising potential for drug development and personalised care, especially in cardiovascular diagnostics and insulin-dependent diabetes management. A particularly compelling application lies in immune responses and immune-mediated diseases, given the immune system’s essential role in preserving human health, from fighting infections to managing autoimmune diseases. Creating Immune Digital Twins (IDTs) holds great promise for medicine and healthcare. At the same time, the development of a reliable and robust IDT presents significant challenges due to the inherent complexity and polymorphism of the human immune system, the difficulties in measuring patients’ immune state in vivo, and the intrinsic difficulties associated with modelling complex biological systems and processes.</div><div>The Working Group “Building Immune Digital Twins” (BIDT WG) aims to address these challenges by fostering transdisciplinary collaborations among immunologists, clinicians, experimentalists, computational biologists, and engineers. The international network is leveraging its cross-disciplinary expertise to build the components required for a working IDT model. Moreover, the BIDT WG focuses on creating an open-access model repository for publicly available immune-related computational models and their required metadata. The group is also active in cataloguing open-access tools, methodologies, and software to identify interoperability gaps in the current modelling landscape.</div><div>Consequently, this work can drive transformative innovations in precision medicine, unlocking new possibilities for the diagnosis, treatment, and management of immune-mediated diseases.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100060"},"PeriodicalIF":0.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01DOI: 10.1016/j.immuno.2025.100057
Eve Richardson , Lisa Willemsen , Pramod Shinde , Morten Nielsen , Bjoern Peters
Vaccines trigger an immune response that results in a population of memory cells that can quickly respond to subsequent antigen re-encounters. Most vaccines are designed to induce memory B cells with vaccine-specific B cell receptors (BCRs). Post-vaccination, clonal expansion of B cells results in measurably expanded vaccine-specific BCR clonotypes. We set out to determine to what extent it is predictable which specific BCR clonotypes are vaccine-induced in an individual. We sequenced the BCR heavy chain repertoire in a cohort of 19 individuals prior- and 7 days post Tdap booster vaccination. We tested two modalities to predict which clonotypes were expanded post-vaccination: first, we utilized a small database of monoclonal antibodies with known specificity to Tdap vaccine antigens and tested various sequence look-up methods, identifying clonal look-up as the best method. We then utilized a leave-one-out approach in which expanded clonotypes in one individual were predicted using data from other members of the cohort. The second approach significantly outperformed the first, indicating that BCR clonotype expansion can be learned across subjects. These results support the utility of systematically collecting BCR specificity data through efforts like the Immune Epitope database and highlight the limitations on general prediction approaches resulting from relatively small dataset sizes for BCRs with known specificities. Additionally, our study provides 1) a comparison of several BCR specificity prediction methods, 2) a dataset that can be used for benchmarking of subsequent methods, and 3) a methodological framework for comparing BCR repertoires pre- and post-vaccination.
{"title":"Is the vaccination-induced B cell receptor repertoire predictable?","authors":"Eve Richardson , Lisa Willemsen , Pramod Shinde , Morten Nielsen , Bjoern Peters","doi":"10.1016/j.immuno.2025.100057","DOIUrl":"10.1016/j.immuno.2025.100057","url":null,"abstract":"<div><div>Vaccines trigger an immune response that results in a population of memory cells that can quickly respond to subsequent antigen re-encounters. Most vaccines are designed to induce memory B cells with vaccine-specific B cell receptors (BCRs). Post-vaccination, clonal expansion of B cells results in measurably expanded vaccine-specific BCR clonotypes. We set out to determine to what extent it is predictable which specific BCR clonotypes are vaccine-induced in an individual. We sequenced the BCR heavy chain repertoire in a cohort of 19 individuals prior- and 7 days post Tdap booster vaccination. We tested two modalities to predict which clonotypes were expanded post-vaccination: first, we utilized a small database of monoclonal antibodies with known specificity to Tdap vaccine antigens and tested various sequence look-up methods, identifying clonal look-up as the best method. We then utilized a leave-one-out approach in which expanded clonotypes in one individual were predicted using data from other members of the cohort. The second approach significantly outperformed the first, indicating that BCR clonotype expansion can be learned across subjects. These results support the utility of systematically collecting BCR specificity data through efforts like the Immune Epitope database and highlight the limitations on general prediction approaches resulting from relatively small dataset sizes for BCRs with known specificities. Additionally, our study provides 1) a comparison of several BCR specificity prediction methods, 2) a dataset that can be used for benchmarking of subsequent methods, and 3) a methodological framework for comparing BCR repertoires pre- and post-vaccination.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100057"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-28DOI: 10.1016/j.immuno.2025.100058
James M. Heather , Ayelet Peres , Gur Yaari , William Lees
The rise of T cell receptor (TCR) sequencing technologies is driving both new understandings of the immune system and the development of novel clinical platforms. Such analyses rely on comparing recombined TCR sequences to unrearranged germline reference sequences during V(D)J annotation. In this study we observed that, despite the importance of this step in TCR analysis, most published studies do not properly report the reference used. We use public datasets to illustrate why references should be explicitly specified: using IMGT/GENE-DB as an example, we document how the reference set changes over time. Furthermore we illustrate how prescriptivist interpretations of reference metadata may be obscuring rather than illuminating TCR biology, and demonstrate the need to perform full V gene sequencing in order to unambiguously determine the final translated TCR polypeptide sequence. In summary, we argue that in order to ensure the accuracy and reproducibility of TCR sequencing – an ever more pressing task as more TCR-based diagnostics and therapeutics are developed – we should all take more care with the development, use, and reporting of the TCR germline references used in our science.
{"title":"The gremlin in the works: why T cell receptor researchers need to pay more attention to germline reference sequences","authors":"James M. Heather , Ayelet Peres , Gur Yaari , William Lees","doi":"10.1016/j.immuno.2025.100058","DOIUrl":"10.1016/j.immuno.2025.100058","url":null,"abstract":"<div><div>The rise of T cell receptor (TCR) sequencing technologies is driving both new understandings of the immune system and the development of novel clinical platforms. Such analyses rely on comparing recombined TCR sequences to unrearranged germline reference sequences during V(D)J annotation. In this study we observed that, despite the importance of this step in TCR analysis, most published studies do not properly report the reference used. We use public datasets to illustrate why references should be explicitly specified: using IMGT/GENE-DB as an example, we document how the reference set changes over time. Furthermore we illustrate how prescriptivist interpretations of reference metadata may be obscuring rather than illuminating TCR biology, and demonstrate the need to perform full V gene sequencing in order to unambiguously determine the final translated TCR polypeptide sequence. In summary, we argue that in order to ensure the accuracy and reproducibility of TCR sequencing – an ever more pressing task as more TCR-based diagnostics and therapeutics are developed – we should all take more care with the development, use, and reporting of the TCR germline references used in our science.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100058"},"PeriodicalIF":0.0,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-27DOI: 10.1016/j.immuno.2025.100059
Corey T. Watson , Andrew M. Collins , Mats Ohlin , James M. Heather , Ayelet Peres , William D. Lees , Gur Yaari
Genetic databases for immunoglobulin (IG) and T cell receptor (TR) genes have evolved from small catalogs to critical resources underpinning immunogenetic research. Accurate annotation enables the analysis of repertoire diversity, somatic hypermutation, clonal relationships, and lineage development. Recent advances in high-throughput repertoire sequencing and long-read genomics now allow for unprecedented discovery of germline variation across populations and species, but they also expose limitations of existing resources. Here, we discuss the historical evolution of IG/TR databases, highlight the challenges and opportunities presented by changing data landscapes, and outline strategies for building future databases that integrate genomic and expression data, support population diversity, and align with evolving nomenclature frameworks. Enhanced germline resources will be essential for accurate annotation, reproducible research, and the next generation of immunological discovery and clinical translation.
{"title":"Building immunoglobulin and T cell receptor gene databases for the future","authors":"Corey T. Watson , Andrew M. Collins , Mats Ohlin , James M. Heather , Ayelet Peres , William D. Lees , Gur Yaari","doi":"10.1016/j.immuno.2025.100059","DOIUrl":"10.1016/j.immuno.2025.100059","url":null,"abstract":"<div><div>Genetic databases for immunoglobulin (IG) and T cell receptor (TR) genes have evolved from small catalogs to critical resources underpinning immunogenetic research. Accurate annotation enables the analysis of repertoire diversity, somatic hypermutation, clonal relationships, and lineage development. Recent advances in high-throughput repertoire sequencing and long-read genomics now allow for unprecedented discovery of germline variation across populations and species, but they also expose limitations of existing resources. Here, we discuss the historical evolution of IG/TR databases, highlight the challenges and opportunities presented by changing data landscapes, and outline strategies for building future databases that integrate genomic and expression data, support population diversity, and align with evolving nomenclature frameworks. Enhanced germline resources will be essential for accurate annotation, reproducible research, and the next generation of immunological discovery and clinical translation.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100059"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144913119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-17DOI: 10.1016/j.immuno.2025.100056
Ulrik Stervbo , Paraskevas Filippidis , Felix Breden , Lindsay G. Cowell , Frederic Davi , Victor Greiff , Anton W. Langerak , Eline T. Luning Prak , Alexandra F. Sharland , Enkelejda Miho , Pieter Meysman
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a promising diagnostic method across various clinical conditions, yet its widespread implementation faces several challenges. This perspective examines the current landscape of AIRR-seq diagnostics and outlines key obstacles and opportunities for advancement. Critical challenges include the need for standardized quality controls, privacy protection under General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA) frameworks, and the development of clinically compatible bioinformatics pipelines. Machine learning approaches offer potential solutions for interpreting complex repertoire signatures, though these models must balance accuracy with interpretability for clinical adoption. Future applications may include early disease detection, prognosis, and monitoring of treatment and vaccine responses. However, successful clinical integration will require sustained collaboration among funding bodies, regulatory agencies, researchers, diagnosticians, and clinicians to establish clear guidelines and expand existing repositories with well-characterized patient samples. The collaborative efforts of the AIRR Diagnostics Working Group and the AIRR Community's initiatives are working towards unlocking the potential of AIRR-seq in precision medicine and enhancing diagnostic capabilities.
{"title":"Challenges and future directions of AIRR-seq-based diagnostics","authors":"Ulrik Stervbo , Paraskevas Filippidis , Felix Breden , Lindsay G. Cowell , Frederic Davi , Victor Greiff , Anton W. Langerak , Eline T. Luning Prak , Alexandra F. Sharland , Enkelejda Miho , Pieter Meysman","doi":"10.1016/j.immuno.2025.100056","DOIUrl":"10.1016/j.immuno.2025.100056","url":null,"abstract":"<div><div>Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a promising diagnostic method across various clinical conditions, yet its widespread implementation faces several challenges. This perspective examines the current landscape of AIRR-seq diagnostics and outlines key obstacles and opportunities for advancement. Critical challenges include the need for standardized quality controls, privacy protection under General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA) frameworks, and the development of clinically compatible bioinformatics pipelines. Machine learning approaches offer potential solutions for interpreting complex repertoire signatures, though these models must balance accuracy with interpretability for clinical adoption. Future applications may include early disease detection, prognosis, and monitoring of treatment and vaccine responses. However, successful clinical integration will require sustained collaboration among funding bodies, regulatory agencies, researchers, diagnosticians, and clinicians to establish clear guidelines and expand existing repositories with well-characterized patient samples. The collaborative efforts of the AIRR Diagnostics Working Group and the AIRR Community's initiatives are working towards unlocking the potential of AIRR-seq in precision medicine and enhancing diagnostic capabilities.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100056"},"PeriodicalIF":0.0,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-07DOI: 10.1016/j.immuno.2025.100051
Marc Hoffstedt, Hermann Wätzig, Knut Baumann
Various methods, differing in complexity, have been developed to predict T-cell receptor epitopes. tcrdist3, which implements an easy-to-interpret distance-based approach, has demonstrated performance comparable to the best feature-based methods. Here, a new substitution matrix for tcrdist3 is proposed and its performance is compared to various other substitution matrices. Small performance gains were possible; however tcrdist3 was found to perform reliably well with most substitution matrices. Randomly generated substitution matrices were used as a baseline and resulted in good classification results. It was observed that the prediction quality was negatively correlated with the relative standard deviation of the matrix used (i.e. a larger variance of the weights resulted in poorer predictivity). The most important factor of the tcrdist3-distance between two sequences that could be singled out is the number of substitutions. tcrdist3 implicitly considers the number of substitutions and the type of substitution simultaneously. Using substitution matrices with larger variance penalizes certain substitutions more strongly, which blurs the clusters of sequences with the same number of substitutions. Since the number of substitutions was a key predictor, this resulted in decreased prediction performance.
{"title":"Comparison of different substitution matrices for distance based T-cell receptor epitope predictions using tcrdist3","authors":"Marc Hoffstedt, Hermann Wätzig, Knut Baumann","doi":"10.1016/j.immuno.2025.100051","DOIUrl":"10.1016/j.immuno.2025.100051","url":null,"abstract":"<div><div>Various methods, differing in complexity, have been developed to predict T-cell receptor epitopes. tcrdist3, which implements an easy-to-interpret distance-based approach, has demonstrated performance comparable to the best feature-based methods. Here, a new substitution matrix for tcrdist3 is proposed and its performance is compared to various other substitution matrices. Small performance gains were possible; however tcrdist3 was found to perform reliably well with most substitution matrices. Randomly generated substitution matrices were used as a baseline and resulted in good classification results. It was observed that the prediction quality was negatively correlated with the relative standard deviation of the matrix used (i.e. a larger variance of the weights resulted in poorer predictivity). The most important factor of the tcrdist3-distance between two sequences that could be singled out is the number of substitutions. tcrdist3 implicitly considers the number of substitutions and the type of substitution simultaneously. Using substitution matrices with larger variance penalizes certain substitutions more strongly, which blurs the clusters of sequences with the same number of substitutions. Since the number of substitutions was a key predictor, this resulted in decreased prediction performance.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100051"},"PeriodicalIF":0.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}