Malaria is caused by Plasmodium, a parasite that replicates inside and ruptures erythrocytes, causing an intense inflammatory response. Advances in high-throughput sequencing technologies have enabled the simultaneous study of the gene expression in humans and P. falciparum. However, the high-dimensional correlational networks generated in previous studies challenge the interpretation of the underlying biology, whereas associations found in one cohort might not replicate in independent samples due confounding factors affecting gene expression. We combined multicohort analysis of correlations with a hierarchical grouping approach to improve the discovery and interpretation of transcriptional associations between humans and P. falciparum. We analyzed nine public dual-transcriptomes acquired from whole blood of individuals infected with P. falciparum. Blood Transcription Modules (BTM) were used to reduce the dimension of host transcriptomes and Spearman's correlation analysis was used to identify host-parasite associations. Following, we performed meta-analysis of correlations with Stouffer's method and Bonferroni correction that resulted in a major transcriptional meta-network between humans and P. falciparum. We identified, for example, positive correlations between PAK1, NFKBIA, BIRC2, NLRC4, TLR4, RIPK2 expression and PF3D7_1205800, a putative P. falciparum high mobility group protein B3 (HMGB3). We also applied a leave-one-out strategy to prevent influence of confounding factors, resulting in highly conserved associations between host genes related to inflammation, immune cells, and glycerophospholipid metabolism with PF3D7_1223400, which encodes a putative phospholipid-transporting ATPase. Paired metabolomics and transcriptomics data revealed negative correlation between PF3D7_1223400 expression and the relative abundance of 1-linoleoyl-GPG. Collectively, our study provides data-driven hypotheses about molecular mechanisms of host-parasite interaction.
The CD8 T cell immune response operates at multiple temporal and spatial scales, including all the early complex biochemical and biomechanical processes, up to long term cell population behavior.
In order to model this response, we devised a multiscale agent-based approach using Simuscale software. Within each agent (cell) of our model, we introduced a gene regulatory network (GRN) based upon a piecewise deterministic Markov process formalism. Cell fate – differentiation, proliferation, death – was coupled to the state of the GRN through rule-based mechanisms. Cells interact in a 3D computational domain and signal to each other via cell–cell contacts, influencing the GRN behavior.
Results show the ability of the model to correctly capture both population behavior and molecular time-dependent evolution. We examined the impact of several parameters on molecular and population dynamics, and demonstrated the add-on value of using a multiscale approach by showing the influence of molecular parameters, particularly protein degradation rates, on the outcome of the response, such as effector and memory cell counts.
Mass spectrometry-based proteomics facilitates the identification and quantification of thousands of proteins but encounters challenges in measuring human antibodies due to their vast diversity. Bottom-up proteomics methods primarily rely on database searches, comparing experimental peptide values to theoretical database sequences. While the human body can produce millions of distinct antibodies, current databases, such as UniProtKB/Swiss-Prot, contain only 1095 sequences (as of January 2024), potentially hindering antibody identification via mass spectrometry. Therefore, expanding the database is crucial for discovering new antibodies. Recent genomic studies have amassed millions of human antibody sequences in the Observed Antibody Space (OAS) database, yet this data remains underutilized. Leveraging this vast collection, we conduct efficient database searches in publicly available proteomics data, focusing on SARS-CoV-2. In our study, thirty million heavy antibody sequences from 146 SARS-CoV-2 patients in the OAS database were digested in silico to obtain 18 million unique peptides. These peptides form the basis for new bottom-up proteomics databases. We used those databases for searching new antibody peptides in publicly available SARS-CoV-2 human plasma samples in the Proteomics Identification Database (PRIDE). This approach avoids false positives in antibody peptide identification as confirmed by searching against negative controls (brain samples) and employing different database sizes. We show that new antibody peptides were found in previous plasma samples and expect that the newly discovered antibody peptides can be further employed to develop therapeutic antibodies. The method will be broadly applicable to find characteristic antibodies for other diseases.
With the application of spatial biology, the detection and identification of the diverse cell types present in the tumor microenvironment, including specific immune subsets, is possible at single cell resolution. Since spatial biology analysis of tumor tissue allows multiple biological parameters to be measured, including cell type, cell number, cell state, as well as the precise location and the spatial relationship of every cell to other cells and histopathological hallmarks, a vast amount of data is generated. The power of this is realized when correlating the spatial biology data with clinical data for each patient, from which the tissue was collected during biopsy or surgery, conducted as part of the patient's diagnosis and treatment. Aside from the enormous leap in chemistry and molecular biology technology required to develop the analytical tools for spatial biology, collection, analysis of cells in the tumor microenvironment has been possible only with the development of computational tools capable of deciphering tumor tissue complexity to predict tumor evolution and response to treatment and the role of immune cells in regulating tumor biology. Here we describe how spatial biology analysis, combined with computational analysis have been used to deconstruct the complexity of the brain tumor microenvironment and shed light on why brain tumors exhibit extreme immunosuppression. We also discuss how the understanding gained using spatial biology has shed light on how tumor immunosuppression can be overcome.
Recent advancements in immune sequencing and experimental techniques are generating extensive T cell receptor (TCR) repertoire data, enabling the development of models to predict TCR binding specificity. Despite the computational challenges posed by the vast diversity of TCRs and epitopes, significant progress has been made. This review explores the evolution of computational models designed for this task, emphasizing machine learning efforts, including early unsupervised clustering approaches, supervised models, and recent applications of Protein Language Models (PLMs), deep learning models pretrained on extensive collections of unlabeled protein sequences that capture crucial biological properties.
We survey the most prominent models in each category and offer a critical discussion on recurrent challenges, including the lack of generalization to new epitopes, dataset biases, and shortcomings in model validation designs. Focusing on PLMs, we discuss the transformative impact of Transformer-based protein models in bioinformatics, particularly in TCR specificity analysis. We discuss recent studies that exploit PLMs to deliver notably competitive performances in TCR-related tasks, while also examining current limitations and future directions. Lastly, we address the pressing need for improved interpretability in these often opaque models, and examine current efforts to extract biological insights from large black box models.
CD8 T cell proper differentiation during antiviral responses relies on metabolic adaptations. Herein, we investigated global metabolic activity in single CD8 T cells along an in vivo response by estimating metabolic fluxes from single-cell RNA-sequencing data. The approach was validated by the observation of metabolic variations known from experimental studies on global cell populations, while adding temporally detailed information and unravelling yet undescribed sections of CD8 T cell metabolism that are affected by cellular differentiation. Furthermore, inter-cellular variability in gene expression level, highlighted by single cell data, and heterogeneity of metabolic activity 4 days post-infection, revealed a new transition stage accompanied by a metabolic switch in activated cells differentiating into full-blown effectors.
Deciphering the antigen recognition capabilities by T-cell and B-cell receptors (antibodies) is essential for advancing our understanding of adaptive immune system responses. In recent years, the development of protein language models (PLMs) has facilitated the development of bioinformatic pipelines where complex amino acid sequences are transformed into vectorized embeddings, which are then applied to a range of downstream analytical tasks. With their success, we have witnessed the emergence of domain-specific PLMs tailored to specific proteins, such as immune receptors. Domain-specific models are often assumed to possess enhanced representation capabilities for targeted applications, however, this assumption has not been thoroughly evaluated. In this manuscript, we assess the efficacy of both generalist and domain-specific transformer-based embeddings in characterizing B and T-cell receptors. Specifically, we assess the accuracy of models that leverage these embeddings to predict antigen specificity and elucidate the evolutionary changes that B cells undergo during an immune response. We demonstrate that the prevailing notion of domain-specific models outperforming general models requires a more nuanced examination. We also observe remarkable differences between generalist and domain-specific PLMs, not only in terms of performance but also in the manner they encode information. Finally, we observe that the choice of the size and the embedding layer in PLMs are essential model hyperparameters in different tasks. Overall, our analyzes reveal the promising potential of PLMs in modeling protein function while providing insights into their information-handling capabilities. We also discuss the crucial factors that should be taken into account when selecting a PLM tailored to a particular task.
It has become a routine to get insights into the multi-scale nature of immune response in health and disease through ‘omics datasets. This presents us with a unique opportunity to leverage our access to such data to develop computational models that can generate usable predictions and mechanistic insights capable of seeding new ideas. However, this is a particularly challenging task due to the difficulty in integrating data and processes across multiple scales. In this review we discuss some of the challenges associated with this task and also the recent advances and opportunities that will help to makes these tractable, using the innate lymphocyte, the natural killer cell as an exemplar.