Immunoinformatics (Amsterdam, Netherlands)最新文献

Active learning for improving out-of-distribution lab-in-the-loop experimental design 主动学习改进非分布实验室在环实验设计

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2026-01-21 DOI: 10.1016/j.immuno.2026.100065

Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff

The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.

准确预测抗体-抗原结合对于开发基于抗体的治疗方法和推进免疫学研究至关重要。文库对文库的方法，其中许多抗原针对许多抗体进行探测，可以识别特定的相互作用对。机器学习模型可以通过分析抗体和抗原之间的多对多关系来预测目标结合。然而，当测试抗体和抗原没有在训练数据中表示时，这些模型在预测相互作用时面临挑战，这种情况被称为分布外预测。生成实验绑定数据是昂贵的，限制了综合数据集的可用性。主动学习可以通过从一个小的标记数据子集开始，迭代地扩展标记数据集来降低成本。很少有主动学习方法可用于处理具有多对多关系的数据，例如，从图书馆对图书馆的筛选方法中获得的数据。在这项研究中，我们采用了12种主动学习策略来预测库间的抗体-抗原结合，并使用Absolut！仿真框架。我们发现，在测试的12种算法中，有3种算法的表现，适度但显著地优于随机数据迭代标记的基线。与随机基线相比，最佳算法将所需抗原突变变体的数量减少了12.5%。这些发现表明，主动学习可以提高库对库环境下的实验效率，并推进抗体-抗原结合预测。

{"title":"Active learning for improving out-of-distribution lab-in-the-loop experimental design","authors":"Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff","doi":"10.1016/j.immuno.2026.100065","DOIUrl":"10.1016/j.immuno.2026.100065","url":null,"abstract":"<div><div>The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"21 ","pages":"Article 100065"},"PeriodicalIF":0.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DoggifAI: A transformer based approach for antibody caninisation DoggifAI：一种基于变压器的抗体犬化方法

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-11-07 DOI: 10.1016/j.immuno.2025.100064

Dominik Grabarczyk , Mikołaj Kocikowski , Maciej Parys , Douglas R. Houston , Ted Hupp , Javier Antonio Alfaro , Shay B. Cohen

Antibody translation across species offers a compelling strategy to extend the vast and expensive investments in human therapeutic antibodies to veterinary oncology, with applications in both veterinary medicine and comparative oncology.

While precise, low-immunogenic treatments are essential for canine cancer care, traditional species conversion methods rely on ad hoc bioinformatics modifications. These methods often implicitly decouple the framework (FR) and complementarity-determining regions (CDRs), ignoring how structural changes in FRs can affect the conformation and function of CDRs. This can compromise binding specificity and require costly high-throughput in vitro screening.

To address this, we present DoggifAI, a transformer model that translates non-canine antibody sequences into canine ones by generating species-appropriate framework regions (FRs) based on desired CDRs. This allows the model to better preserve structural compatibility between FRs and CDRs. The model is pretrained in a T5-style text-to-text denoising task on a large multispecies antibody dataset, which allows further finetuning on a much smaller species-specific dataset.

DoggifAI generates highly canine-like antibodies and shows promising results in preserving binding specificity. To support further progress in this field, we also release a curated dataset of over 430,000 unique canine antibody chain sequences, significantly expanding the public sequence repertoire.

跨物种抗体翻译提供了一个令人信服的策略，将人类治疗性抗体的巨大而昂贵的投资扩展到兽医肿瘤学，在兽医医学和比较肿瘤学中都有应用。虽然精确的低免疫原性治疗对犬类癌症治疗至关重要，但传统的物种转换方法依赖于特别的生物信息学修饰。这些方法通常隐式解耦框架（FR）和互补决定区（cdr），忽略了FRs的结构变化如何影响cdr的构象和功能。这可能会损害结合特异性，需要昂贵的高通量体外筛选。为了解决这个问题，我们提出了DoggifAI，这是一个变压器模型，通过基于所需的cdr生成物种合适的框架区域（FRs），将非犬类抗体序列转化为犬类抗体序列。这使得模型可以更好地保持fr和cdr之间的结构兼容性。该模型在大型多物种抗体数据集上以t5风格的文本到文本去噪任务进行预训练，这允许在更小的物种特异性数据集上进一步微调。DoggifAI产生高度类似犬的抗体，并在保持结合特异性方面显示出有希望的结果。为了支持这一领域的进一步发展，我们还发布了一个超过43万个独特犬抗体链序列的精选数据集，大大扩展了公共序列库。

{"title":"DoggifAI: A transformer based approach for antibody caninisation","authors":"Dominik Grabarczyk , Mikołaj Kocikowski , Maciej Parys , Douglas R. Houston , Ted Hupp , Javier Antonio Alfaro , Shay B. Cohen","doi":"10.1016/j.immuno.2025.100064","DOIUrl":"10.1016/j.immuno.2025.100064","url":null,"abstract":"<div><div>Antibody translation across species offers a compelling strategy to extend the vast and expensive investments in human therapeutic antibodies to veterinary oncology, with applications in both veterinary medicine and comparative oncology.</div><div>While precise, low-immunogenic treatments are essential for canine cancer care, traditional species conversion methods rely on ad hoc bioinformatics modifications. These methods often implicitly decouple the framework (FR) and complementarity-determining regions (CDRs), ignoring how structural changes in FRs can affect the conformation and function of CDRs. This can compromise binding specificity and require costly high-throughput <em>in vitro</em> screening.</div><div>To address this, we present DoggifAI, a transformer model that translates non-canine antibody sequences into canine ones by generating species-appropriate framework regions (FRs) based on desired CDRs. This allows the model to better preserve structural compatibility between FRs and CDRs. The model is pretrained in a T5-style text-to-text denoising task on a large multispecies antibody dataset, which allows further finetuning on a much smaller species-specific dataset.</div><div>DoggifAI generates highly canine-like antibodies and shows promising results in preserving binding specificity. To support further progress in this field, we also release a curated dataset of over 430,000 unique canine antibody chain sequences, significantly expanding the public sequence repertoire.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100064"},"PeriodicalIF":0.0,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Where single-cell transcriptomics fails T cells: The misuse of unsupervised clustering for T-cell annotation 单细胞转录组学在T细胞失败的地方：对T细胞注释滥用无监督聚类

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-10-21 DOI: 10.1016/j.immuno.2025.100063

Kerry A. Mullan , Sebastiaan Valkiers , Nicky de Vrij , Chen Li , Sara Verbandt , Ting Pu , Pieter Meysman

The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ T cells apart from CD4+ T cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ T cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.

目前的单细胞转录组询问通常包括使用无监督聚类方法，然后是基于专家意见的注释。潜在的假设是，这一过程将准确地识别细胞亚群之间的转录差异，从而能够将CD8+ T细胞与CD4+ T细胞分开聚集。然而，这种广泛应用的聚类反应t细胞生物学的假设从未得到验证。我们使用了一个大型t细胞图谱（V2），该图谱结合了12个10x Genomics单个t细胞转录组学数据集（~ 500 K细胞）以及一个独立的CITE-seq数据集，以确定Seurat产生的无监督聚类是否反映了生物学。然后使用关键标记基因的表达来评估注释。主要的t细胞标记CD8和CD4在大多数集群中是混合的，无论特征选择和主要/和谐成分或特征。驱动聚类的因素还与细胞功能（葡萄糖代谢）、t细胞受体（TCR）、免疫球蛋白和HLA转录物有关，而不是典型的标志物。与目前的假设相反，这种聚类并不是由T细胞表型驱动的，也不能准确地分离CD4+和CD8+ T细胞，更不用说亚分类了。这意味着如果使用标准的基于簇的注释方法，许多T细胞将被错误地分类。应该谨慎使用依赖于无监督聚类的方法，因为不当的处理可能会歪曲数据，并且应该优先选择使用TCR-seq或基于蛋白质的注释的半监督方法。

{"title":"Where single-cell transcriptomics fails T cells: The misuse of unsupervised clustering for T-cell annotation","authors":"Kerry A. Mullan , Sebastiaan Valkiers , Nicky de Vrij , Chen Li , Sara Verbandt , Ting Pu , Pieter Meysman","doi":"10.1016/j.immuno.2025.100063","DOIUrl":"10.1016/j.immuno.2025.100063","url":null,"abstract":"<div><div>The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ <em>T</em> cells apart from CD4+ <em>T</em> cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ <em>T</em> cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100063"},"PeriodicalIF":0.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning in AIRR diagnostics: Advances and applications AIRR诊断中的机器学习：进展和应用

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-10-21 DOI: 10.1016/j.immuno.2025.100062

Aslı Semerci , Celine AlBalaa , Brian Corrie , Dylan Duchen , Gisela Gabernet , Jinwoo Leem , Enkelejda Miho , Ulrik Stervbo , Justin Barton , Pieter Meysman , AIRR-Community

Recent advancements in sequencing technologies have led to an exponential increase in adaptive immune receptor repertoire (AIRR) data. These receptors, crucial to the adaptive immune system, are believed to have strong potential for diagnostic applications. The immune repertoires represent a wealth of data, creating a growing demand for robust computational methods to analyze and interpret this vast amount of information.

In this review, we examine the application of machine learning algorithms for the classification and analysis of AIRR-seq data for different diagnostic applications. We provide a high-level division of current approaches based on their focus on repertoire-level or sequence-level features. We provide an overview of the current state of public AIRR data sets available for model training. Finally, we briefly highlight what lessons can be learned from successful AIRR diagnostic approaches and what hurdles still must be overcome.

最近测序技术的进步导致适应性免疫受体库（AIRR）数据呈指数级增长。这些受体对适应性免疫系统至关重要，被认为具有很强的诊断应用潜力。免疫库代表了丰富的数据，创造了对强大的计算方法来分析和解释这大量信息的日益增长的需求。在这篇综述中，我们研究了机器学习算法在AIRR-seq数据分类和分析中的应用，以用于不同的诊断应用。我们对当前的方法进行了高层次的划分，基于它们对曲目级别或序列级别特征的关注。我们概述了可用于模型训练的公共AIRR数据集的当前状态。最后，我们简要地强调了从成功的AIRR诊断方法中可以吸取的经验教训以及仍然需要克服的障碍。

{"title":"Machine learning in AIRR diagnostics: Advances and applications","authors":"Aslı Semerci , Celine AlBalaa , Brian Corrie , Dylan Duchen , Gisela Gabernet , Jinwoo Leem , Enkelejda Miho , Ulrik Stervbo , Justin Barton , Pieter Meysman , AIRR-Community","doi":"10.1016/j.immuno.2025.100062","DOIUrl":"10.1016/j.immuno.2025.100062","url":null,"abstract":"<div><div>Recent advancements in sequencing technologies have led to an exponential increase in adaptive immune receptor repertoire (AIRR) data. These receptors, crucial to the adaptive immune system, are believed to have strong potential for diagnostic applications. The immune repertoires represent a wealth of data, creating a growing demand for robust computational methods to analyze and interpret this vast amount of information.</div><div>In this review, we examine the application of machine learning algorithms for the classification and analysis of AIRR-seq data for different diagnostic applications. We provide a high-level division of current approaches based on their focus on repertoire-level or sequence-level features. We provide an overview of the current state of public AIRR data sets available for model training. Finally, we briefly highlight what lessons can be learned from successful AIRR diagnostic approaches and what hurdles still must be overcome.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100062"},"PeriodicalIF":0.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IApred: A versatile open-source tool for predicting protein antigenicity across diverse pathogens IApred：一个多功能的开源工具，用于预测不同病原体的蛋白质抗原性

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-10-09 DOI: 10.1016/j.immuno.2025.100061

Sebastian Miles, Gonzalo Menafra, Andrés Iriarte, Jose Alejandro Chabalgoity

Accurate prediction of protein antigenicity is crucial for vaccine development, diagnostic test design, and therapeutic protein engineering. However, existing tools face limitations in accessibility, computational efficiency, and pathogen diversity. Here, we present IApred, an open-source intrinsic antigenicity predictor that addresses these challenges. IApred employs a Support Vector Machine (SVM) model trained on a comprehensive dataset of 918 high-antigenicity proteins from diverse pathogens, including Gram-positive and Gram-negative bacteria, viruses, fungi, protozoa, and helminths. The model incorporates features derived from physicochemical properties, E-descriptors, amino acid dimers and small linear motifs (SLiMs) to predict the probability of a protein eliciting a humoral immune response. In external validation, IApred demonstrated superior balanced performance (ROC AUC = 0.761, sensitivity = 0.702, specificity = 0.706) compared to existing tools (VaxiJen 2.0, VaxiJen 3.0 and ANTIGENpro), while maintaining high computational efficiency (approximately 1000 sequences per minute). IApred's host-and-pathogen-agnostic nature and integration capability into bioinformatic pipelines makes it versatile for diverse applications. A web-based version of the software is available at https://smilesinformatics.com/iapred, while the software and training code are freely available on GitHub (https://github.com/sebamiles/IAPred) and Zenodo (https://doi.org/10.5281/zenodo.14578279)

准确预测蛋白质抗原性对疫苗开发、诊断试验设计和治疗性蛋白质工程至关重要。然而，现有的工具在可及性、计算效率和病原体多样性方面面临限制。在这里，我们提出了IApred，一个开源的内在抗原性预测器，解决了这些挑战。IApred采用了一个支持向量机（SVM）模型，该模型训练了918个高抗原性蛋白质的综合数据集，这些蛋白质来自不同的病原体，包括革兰氏阳性和革兰氏阴性细菌、病毒、真菌、原生动物和蠕虫。该模型结合了来自物理化学性质、e -描述符、氨基酸二聚体和小线性基序（SLiMs）的特征，以预测蛋白质引发体液免疫反应的概率。在外部验证中，与现有工具（VaxiJen 2.0、VaxiJen 3.0和ANTIGENpro）相比，IApred表现出更好的平衡性能（ROC AUC = 0.761，灵敏度= 0.702，特异性= 0.706），同时保持较高的计算效率（每分钟约1000个序列）。IApred的宿主和病原体不可知性以及与生物信息管道的集成能力使其适用于各种应用。该软件的网络版本可在https://smilesinformatics.com/iapred上获得，而软件和培训代码可在GitHub （https://github.com/sebamiles/IAPred）和Zenodo （https://doi.org/10.5281/zenodo.14578279）上免费获得。

{"title":"IApred: A versatile open-source tool for predicting protein antigenicity across diverse pathogens","authors":"Sebastian Miles, Gonzalo Menafra, Andrés Iriarte, Jose Alejandro Chabalgoity","doi":"10.1016/j.immuno.2025.100061","DOIUrl":"10.1016/j.immuno.2025.100061","url":null,"abstract":"<div><div>Accurate prediction of protein antigenicity is crucial for vaccine development, diagnostic test design, and therapeutic protein engineering. However, existing tools face limitations in accessibility, computational efficiency, and pathogen diversity. Here, we present IApred, an open-source intrinsic antigenicity predictor that addresses these challenges. IApred employs a Support Vector Machine (SVM) model trained on a comprehensive dataset of 918 high-antigenicity proteins from diverse pathogens, including Gram-positive and Gram-negative bacteria, viruses, fungi, protozoa, and helminths. The model incorporates features derived from physicochemical properties, <em>E</em>-descriptors, amino acid dimers and small linear motifs (SLiMs) to predict the probability of a protein eliciting a humoral immune response. In external validation, IApred demonstrated superior balanced performance (ROC AUC = 0.761, sensitivity = 0.702, specificity = 0.706) compared to existing tools (VaxiJen 2.0, VaxiJen 3.0 and ANTIGENpro), while maintaining high computational efficiency (approximately 1000 sequences per minute). IApred's host-and-pathogen-agnostic nature and integration capability into bioinformatic pipelines makes it versatile for diverse applications. A web-based version of the software is available at <span><span>https://smilesinformatics.com/iapred</span><svg><path></path></svg></span>, while the software and training code are freely available on GitHub (<span><span>https://github.com/sebamiles/IAPred</span><svg><path></path></svg></span>) and Zenodo (<span><span>https://doi.org/10.5281/zenodo.14578279</span><svg><path></path></svg></span><strong>)</strong></div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100061"},"PeriodicalIF":0.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Building immune digital twins: An international and transdisciplinary community effort 建立免疫数字双胞胎：一个国际和跨学科的社区努力

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-09-16 DOI: 10.1016/j.immuno.2025.100060

Anna Niarakis , Gary An , Luiz Ladeira , Noriko F. Hiroi , Athina Papadopoulou , Francis P. Crawley , Niloofar Nikaein , Laurence Calzone , Eirini Tsirvouli , Hasan Balci , Marina Esteban Medina , Lorenzo Veschini , Ozan Ozisik , Francesco Messina , Malvina Marku , Van Du T. Tran , Arnau Montagud , Nikola Schlosserova , Yashwanth Subbannayya , Martina Kutmon , Reinhard Laubenbacher

Digital twins, initially developed for industrial applications, are set to make significant advancements in medicine and healthcare. They have demonstrated promising potential for drug development and personalised care, especially in cardiovascular diagnostics and insulin-dependent diabetes management. A particularly compelling application lies in immune responses and immune-mediated diseases, given the immune system’s essential role in preserving human health, from fighting infections to managing autoimmune diseases. Creating Immune Digital Twins (IDTs) holds great promise for medicine and healthcare. At the same time, the development of a reliable and robust IDT presents significant challenges due to the inherent complexity and polymorphism of the human immune system, the difficulties in measuring patients’ immune state in vivo, and the intrinsic difficulties associated with modelling complex biological systems and processes.

The Working Group “Building Immune Digital Twins” (BIDT WG) aims to address these challenges by fostering transdisciplinary collaborations among immunologists, clinicians, experimentalists, computational biologists, and engineers. The international network is leveraging its cross-disciplinary expertise to build the components required for a working IDT model. Moreover, the BIDT WG focuses on creating an open-access model repository for publicly available immune-related computational models and their required metadata. The group is also active in cataloguing open-access tools, methodologies, and software to identify interoperability gaps in the current modelling landscape.

Consequently, this work can drive transformative innovations in precision medicine, unlocking new possibilities for the diagnosis, treatment, and management of immune-mediated diseases.

最初为工业应用而开发的数字双胞胎将在医学和医疗保健领域取得重大进展。它们在药物开发和个性化护理方面，特别是在心血管诊断和胰岛素依赖型糖尿病管理方面，显示出了巨大的潜力。一个特别引人注目的应用是免疫反应和免疫介导的疾病，考虑到免疫系统在保护人类健康方面的重要作用，从对抗感染到管理自身免疫性疾病。创造免疫数字双胞胎（IDTs）在医药和医疗保健领域有着巨大的前景。与此同时，由于人类免疫系统固有的复杂性和多态性，在体内测量患者免疫状态的困难，以及与复杂生物系统和过程建模相关的内在困难，开发可靠且强大的IDT面临着重大挑战。“构建免疫数字双胞胎”工作组（BIDT WG）旨在通过促进免疫学家、临床医生、实验家、计算生物学家和工程师之间的跨学科合作来应对这些挑战。国际网络正在利用其跨学科的专门知识来建立一个有效的IDT模型所需的组成部分。此外，BIDT工作组侧重于为公开可用的免疫相关计算模型及其所需元数据创建开放访问模型存储库。该小组还积极对开放获取工具、方法和软件进行编目，以确定当前建模领域的互操作性差距。因此，这项工作可以推动精准医学的变革性创新，为免疫介导疾病的诊断、治疗和管理开辟新的可能性。

{"title":"Building immune digital twins: An international and transdisciplinary community effort","authors":"Anna Niarakis , Gary An , Luiz Ladeira , Noriko F. Hiroi , Athina Papadopoulou , Francis P. Crawley , Niloofar Nikaein , Laurence Calzone , Eirini Tsirvouli , Hasan Balci , Marina Esteban Medina , Lorenzo Veschini , Ozan Ozisik , Francesco Messina , Malvina Marku , Van Du T. Tran , Arnau Montagud , Nikola Schlosserova , Yashwanth Subbannayya , Martina Kutmon , Reinhard Laubenbacher","doi":"10.1016/j.immuno.2025.100060","DOIUrl":"10.1016/j.immuno.2025.100060","url":null,"abstract":"<div><div>Digital twins, initially developed for industrial applications, are set to make significant advancements in medicine and healthcare. They have demonstrated promising potential for drug development and personalised care, especially in cardiovascular diagnostics and insulin-dependent diabetes management. A particularly compelling application lies in immune responses and immune-mediated diseases, given the immune system’s essential role in preserving human health, from fighting infections to managing autoimmune diseases. Creating Immune Digital Twins (IDTs) holds great promise for medicine and healthcare. At the same time, the development of a reliable and robust IDT presents significant challenges due to the inherent complexity and polymorphism of the human immune system, the difficulties in measuring patients’ immune state in vivo, and the intrinsic difficulties associated with modelling complex biological systems and processes.</div><div>The Working Group “Building Immune Digital Twins” (BIDT WG) aims to address these challenges by fostering transdisciplinary collaborations among immunologists, clinicians, experimentalists, computational biologists, and engineers. The international network is leveraging its cross-disciplinary expertise to build the components required for a working IDT model. Moreover, the BIDT WG focuses on creating an open-access model repository for publicly available immune-related computational models and their required metadata. The group is also active in cataloguing open-access tools, methodologies, and software to identify interoperability gaps in the current modelling landscape.</div><div>Consequently, this work can drive transformative innovations in precision medicine, unlocking new possibilities for the diagnosis, treatment, and management of immune-mediated diseases.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100060"},"PeriodicalIF":0.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145417676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Is the vaccination-induced B cell receptor repertoire predictable? 疫苗诱导的B细胞受体库可预测吗？

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-09-01 DOI: 10.1016/j.immuno.2025.100057

Eve Richardson , Lisa Willemsen , Pramod Shinde , Morten Nielsen , Bjoern Peters

Vaccines trigger an immune response that results in a population of memory cells that can quickly respond to subsequent antigen re-encounters. Most vaccines are designed to induce memory B cells with vaccine-specific B cell receptors (BCRs). Post-vaccination, clonal expansion of B cells results in measurably expanded vaccine-specific BCR clonotypes. We set out to determine to what extent it is predictable which specific BCR clonotypes are vaccine-induced in an individual. We sequenced the BCR heavy chain repertoire in a cohort of 19 individuals prior- and 7 days post Tdap booster vaccination. We tested two modalities to predict which clonotypes were expanded post-vaccination: first, we utilized a small database of monoclonal antibodies with known specificity to Tdap vaccine antigens and tested various sequence look-up methods, identifying clonal look-up as the best method. We then utilized a leave-one-out approach in which expanded clonotypes in one individual were predicted using data from other members of the cohort. The second approach significantly outperformed the first, indicating that BCR clonotype expansion can be learned across subjects. These results support the utility of systematically collecting BCR specificity data through efforts like the Immune Epitope database and highlight the limitations on general prediction approaches resulting from relatively small dataset sizes for BCRs with known specificities. Additionally, our study provides 1) a comparison of several BCR specificity prediction methods, 2) a dataset that can be used for benchmarking of subsequent methods, and 3) a methodological framework for comparing BCR repertoires pre- and post-vaccination.

疫苗会引发免疫反应，导致记忆细胞群迅速对随后再次遇到的抗原做出反应。大多数疫苗被设计成用疫苗特异性B细胞受体（BCRs）诱导记忆B细胞。接种后，B细胞克隆扩增导致可测量扩增的疫苗特异性BCR克隆型。我们着手确定在多大程度上可以预测疫苗在个体中诱导的特定BCR克隆型。我们对接种Tdap加强疫苗前和接种后7天的19名个体的BCR重链库进行了测序。我们测试了两种方法来预测疫苗接种后扩增的克隆型：首先，我们利用了一个已知Tdap疫苗抗原特异性的小型单克隆抗体数据库，并测试了各种序列查找方法，确定克隆查找方法是最佳方法。然后，我们使用了一种“留一”方法，其中使用来自队列其他成员的数据预测一个个体的扩展克隆型。第二种方法明显优于第一种方法，表明BCR克隆型扩增可以跨受试者学习。这些结果支持通过免疫表位数据库等工作系统地收集BCR特异性数据的效用，并突出了由于已知特异性的BCR相对较小的数据集大小而导致的一般预测方法的局限性。此外，我们的研究提供了1)几种BCR特异性预测方法的比较，2)可用于对后续方法进行基准测试的数据集，以及3)比较接种前和接种后BCR库的方法框架。

{"title":"Is the vaccination-induced B cell receptor repertoire predictable?","authors":"Eve Richardson , Lisa Willemsen , Pramod Shinde , Morten Nielsen , Bjoern Peters","doi":"10.1016/j.immuno.2025.100057","DOIUrl":"10.1016/j.immuno.2025.100057","url":null,"abstract":"<div><div>Vaccines trigger an immune response that results in a population of memory cells that can quickly respond to subsequent antigen re-encounters. Most vaccines are designed to induce memory B cells with vaccine-specific B cell receptors (BCRs). Post-vaccination, clonal expansion of B cells results in measurably expanded vaccine-specific BCR clonotypes. We set out to determine to what extent it is predictable which specific BCR clonotypes are vaccine-induced in an individual. We sequenced the BCR heavy chain repertoire in a cohort of 19 individuals prior- and 7 days post Tdap booster vaccination. We tested two modalities to predict which clonotypes were expanded post-vaccination: first, we utilized a small database of monoclonal antibodies with known specificity to Tdap vaccine antigens and tested various sequence look-up methods, identifying clonal look-up as the best method. We then utilized a leave-one-out approach in which expanded clonotypes in one individual were predicted using data from other members of the cohort. The second approach significantly outperformed the first, indicating that BCR clonotype expansion can be learned across subjects. These results support the utility of systematically collecting BCR specificity data through efforts like the Immune Epitope database and highlight the limitations on general prediction approaches resulting from relatively small dataset sizes for BCRs with known specificities. Additionally, our study provides 1) a comparison of several BCR specificity prediction methods, 2) a dataset that can be used for benchmarking of subsequent methods, and 3) a methodological framework for comparing BCR repertoires pre- and post-vaccination.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100057"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The gremlin in the works: why T cell receptor researchers need to pay more attention to germline reference sequences 工作中的小妖精：为什么T细胞受体研究人员需要更多地关注种系参考序列

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-08-28 DOI: 10.1016/j.immuno.2025.100058

James M. Heather , Ayelet Peres , Gur Yaari , William Lees

The rise of T cell receptor (TCR) sequencing technologies is driving both new understandings of the immune system and the development of novel clinical platforms. Such analyses rely on comparing recombined TCR sequences to unrearranged germline reference sequences during V(D)J annotation. In this study we observed that, despite the importance of this step in TCR analysis, most published studies do not properly report the reference used. We use public datasets to illustrate why references should be explicitly specified: using IMGT/GENE-DB as an example, we document how the reference set changes over time. Furthermore we illustrate how prescriptivist interpretations of reference metadata may be obscuring rather than illuminating TCR biology, and demonstrate the need to perform full V gene sequencing in order to unambiguously determine the final translated TCR polypeptide sequence. In summary, we argue that in order to ensure the accuracy and reproducibility of TCR sequencing – an ever more pressing task as more TCR-based diagnostics and therapeutics are developed – we should all take more care with the development, use, and reporting of the TCR germline references used in our science.

T细胞受体（TCR）测序技术的兴起正在推动对免疫系统的新理解和新的临床平台的发展。这种分析依赖于在V(D)J注释期间将重组的TCR序列与未重组的种系参考序列进行比较。在本研究中，我们观察到，尽管这一步在TCR分析中很重要，但大多数已发表的研究并未正确报告所使用的参考文献。我们使用公共数据集来说明为什么应该明确指定引用：以IMGT/GENE-DB为例，我们记录了引用集如何随时间变化。此外，我们说明了参考元数据的规范解释如何模糊而不是阐明TCR生物学，并证明需要执行全V基因测序，以便明确确定最终翻译的TCR多肽序列。总之，我们认为，为了确保TCR测序的准确性和可重复性——随着越来越多基于TCR的诊断和治疗方法的开发，这是一项越来越紧迫的任务——我们都应该更加注意在我们的科学中使用的TCR生殖系参考文献的开发、使用和报告。

{"title":"The gremlin in the works: why T cell receptor researchers need to pay more attention to germline reference sequences","authors":"James M. Heather , Ayelet Peres , Gur Yaari , William Lees","doi":"10.1016/j.immuno.2025.100058","DOIUrl":"10.1016/j.immuno.2025.100058","url":null,"abstract":"<div><div>The rise of T cell receptor (TCR) sequencing technologies is driving both new understandings of the immune system and the development of novel clinical platforms. Such analyses rely on comparing recombined TCR sequences to unrearranged germline reference sequences during V(D)J annotation. In this study we observed that, despite the importance of this step in TCR analysis, most published studies do not properly report the reference used. We use public datasets to illustrate why references should be explicitly specified: using IMGT/GENE-DB as an example, we document how the reference set changes over time. Furthermore we illustrate how prescriptivist interpretations of reference metadata may be obscuring rather than illuminating TCR biology, and demonstrate the need to perform full V gene sequencing in order to unambiguously determine the final translated TCR polypeptide sequence. In summary, we argue that in order to ensure the accuracy and reproducibility of TCR sequencing – an ever more pressing task as more TCR-based diagnostics and therapeutics are developed – we should all take more care with the development, use, and reporting of the TCR germline references used in our science.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"20 ","pages":"Article 100058"},"PeriodicalIF":0.0,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Building immunoglobulin and T cell receptor gene databases for the future 未来建立免疫球蛋白和T细胞受体基因数据库

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-08-27 DOI: 10.1016/j.immuno.2025.100059

Corey T. Watson , Andrew M. Collins , Mats Ohlin , James M. Heather , Ayelet Peres , William D. Lees , Gur Yaari

Genetic databases for immunoglobulin (IG) and T cell receptor (TR) genes have evolved from small catalogs to critical resources underpinning immunogenetic research. Accurate annotation enables the analysis of repertoire diversity, somatic hypermutation, clonal relationships, and lineage development. Recent advances in high-throughput repertoire sequencing and long-read genomics now allow for unprecedented discovery of germline variation across populations and species, but they also expose limitations of existing resources. Here, we discuss the historical evolution of IG/TR databases, highlight the challenges and opportunities presented by changing data landscapes, and outline strategies for building future databases that integrate genomic and expression data, support population diversity, and align with evolving nomenclature frameworks. Enhanced germline resources will be essential for accurate annotation, reproducible research, and the next generation of immunological discovery and clinical translation.

免疫球蛋白（IG）和T细胞受体（TR）基因的遗传数据库已经从小型目录发展成为支持免疫遗传学研究的关键资源。准确的注释能够分析曲目多样性，体细胞超突变，克隆关系和谱系发展。高通量全库测序和长读基因组学的最新进展现在允许在种群和物种之间前所未有地发现生殖系变异，但它们也暴露了现有资源的局限性。本文讨论了IG/TR数据库的历史演变，强调了不断变化的数据格局所带来的挑战和机遇，并概述了构建整合基因组和表达数据、支持种群多样性并与不断发展的命名框架保持一致的未来数据库的策略。增强生殖系资源对于准确注释、可重复性研究以及下一代免疫学发现和临床翻译至关重要。

引用次数: 0

Challenges and future directions of AIRR-seq-based diagnostics 基于airr -seq的诊断的挑战和未来方向

Immunoinformatics (Amsterdam, Netherlands)

Pub Date : 2025-07-17 DOI: 10.1016/j.immuno.2025.100056

Ulrik Stervbo , Paraskevas Filippidis , Felix Breden , Lindsay G. Cowell , Frederic Davi , Victor Greiff , Anton W. Langerak , Eline T. Luning Prak , Alexandra F. Sharland , Enkelejda Miho , Pieter Meysman

Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a promising diagnostic method across various clinical conditions, yet its widespread implementation faces several challenges. This perspective examines the current landscape of AIRR-seq diagnostics and outlines key obstacles and opportunities for advancement. Critical challenges include the need for standardized quality controls, privacy protection under General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA) frameworks, and the development of clinically compatible bioinformatics pipelines. Machine learning approaches offer potential solutions for interpreting complex repertoire signatures, though these models must balance accuracy with interpretability for clinical adoption. Future applications may include early disease detection, prognosis, and monitoring of treatment and vaccine responses. However, successful clinical integration will require sustained collaboration among funding bodies, regulatory agencies, researchers, diagnosticians, and clinicians to establish clear guidelines and expand existing repositories with well-characterized patient samples. The collaborative efforts of the AIRR Diagnostics Working Group and the AIRR Community's initiatives are working towards unlocking the potential of AIRR-seq in precision medicine and enhancing diagnostic capabilities.

适应性免疫受体库测序（AIRR-seq）是一种很有前途的诊断方法，适用于各种临床条件，但其广泛实施面临着一些挑战。这一观点考察了AIRR-seq诊断的现状，并概述了发展的主要障碍和机会。关键的挑战包括需要标准化的质量控制，在通用数据保护条例（GDPR）和健康保险流通与责任法案（HIPAA）框架下的隐私保护，以及开发临床兼容的生物信息学管道。机器学习方法为解释复杂的曲目特征提供了潜在的解决方案，尽管这些模型必须平衡准确性和临床采用的可解释性。未来的应用可能包括早期疾病检测、预后、治疗和疫苗反应监测。然而，成功的临床整合将需要资助机构、监管机构、研究人员、诊断医生和临床医生之间的持续合作，以建立明确的指导方针，并扩大现有的具有良好特征的患者样本库。AIRR诊断工作组和AIRR社区的倡议正在共同努力，以释放AIRR-seq在精准医学和增强诊断能力方面的潜力。

{"title":"Challenges and future directions of AIRR-seq-based diagnostics","authors":"Ulrik Stervbo , Paraskevas Filippidis , Felix Breden , Lindsay G. Cowell , Frederic Davi , Victor Greiff , Anton W. Langerak , Eline T. Luning Prak , Alexandra F. Sharland , Enkelejda Miho , Pieter Meysman","doi":"10.1016/j.immuno.2025.100056","DOIUrl":"10.1016/j.immuno.2025.100056","url":null,"abstract":"<div><div>Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a promising diagnostic method across various clinical conditions, yet its widespread implementation faces several challenges. This perspective examines the current landscape of AIRR-seq diagnostics and outlines key obstacles and opportunities for advancement. Critical challenges include the need for standardized quality controls, privacy protection under General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA) frameworks, and the development of clinically compatible bioinformatics pipelines. Machine learning approaches offer potential solutions for interpreting complex repertoire signatures, though these models must balance accuracy with interpretability for clinical adoption. Future applications may include early disease detection, prognosis, and monitoring of treatment and vaccine responses. However, successful clinical integration will require sustained collaboration among funding bodies, regulatory agencies, researchers, diagnosticians, and clinicians to establish clear guidelines and expand existing repositories with well-characterized patient samples. The collaborative efforts of the AIRR Diagnostics Working Group and the AIRR Community's initiatives are working towards unlocking the potential of AIRR-seq in precision medicine and enhancing diagnostic capabilities.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"19 ","pages":"Article 100056"},"PeriodicalIF":0.0,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0