Background: Immunopeptidomics is the large-scale study of peptides presented by major histocompatibility complex (MHC) molecules and plays a central role in neoantigen discovery and cancer immunotherapy. However, the complexity of mass spectrometry data, the diversity of peptide sources, and variability in immune responses present major challenges in this field.
Review focus: In recent years, artificial intelligence (AI)-based methods have become central to advancing key steps in immunopeptidomics. It has enabled advances in de novo sequencing, peptide-spectrum matching, spectrum prediction, MHC binding prediction, and T cell recognition modeling. In this review, we examine these applications in detail, highlighting how AI is integrated into each stage of the immunopeptidomics workflow.
Case study: This review presents a focused case study on breast cancer, a heterogeneous and historically less immunogenic tumor type, to examine how AI may help overcome limitations in identifying actionable neoantigens.
Challenges and future perspectives: We discuss current bottlenecks, including challenges in modeling noncanonical peptides, accounting for antigen processing defects, and avoiding on-target off-tumor toxicity. Finally, we outline future directions for improving AI models to support both personalized and off-the-shelf immunotherapy strategies.
Summary: Artificial intelligence (AI) is reshaping the immunopeptidomics landscape by overcoming challenges in peptide identification, immunogenicity prediction, and neoantigen prioritization. This review highlights how AI-based tools enhance the detection of MHC-bound peptides-including low-abundance, noncanonical, and post-translationally modified epitopes and improve peptide-spectrum matching and T-cell epitope prediction. By demonstrating a case study on applications in breast cancer, we illustrate the potential of AI to reveal hidden immunogenic features in tumors previously likely considered immunologically "cold." These advancements open new opportunities for expanding neoantigen discovery pipelines and optimizing cancer immunotherapies. Looking ahead, the application of deep learning, transfer learning, and integrated multi-omics models may further elevate the accuracy and scalability of immunopeptidomics, enabling more effective and inclusive vaccine and T-cell therapy development.
Background: Endometrial carcinoma (EC) represents a significant clinical challenge due to its pronounced molecular heterogeneity, directly influencing prognosis and therapeutic responses. Accurate classification of molecular subtypes (CNV-high, CNV-low, MSI-H, POLE) and precise tumor mutational burden (TMB) assessment is crucial for guiding personalized therapeutic interventions. Integrating proteomics data with advanced machine learning (ML) techniques offers a promising strategy for achieving precise, clinically actionable classification and biomarker discovery in EC.
Materials and methods: Using proteomic data from 95 EC patients (83 endometrioid, 12 serous), sourced from the Clinical Proteomic Tumor Analysis Consortium (CPTAC), we developed an ML pipeline integrating proteomic feature selection (Lasso-penalized logistic regression), classification modeling, and interpretability analysis. The dataset was divided into training (70%) and test (30%) sets, with synthetic minority oversampling (SMOTE) applied to address the class imbalance. Logistic regression models were trained for molecular subtypes classification, and the TMB prediction model performance was evaluated using accuracy, AUC, precision, recall, and F1-score. Model interpretability was enhanced using explainable AI (XAI) techniques: SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME).
Results: Feature selection reduced the proteomic dataset from 11,000 to eight key proteins. The proteomics-based ML model demonstrated robust predictive performance, accurately classifying EC molecular subtypes (accuracy: 82.8%; AUC: 0.990) and distinguishing high (≥10 mutations/Mb) versus low TMB (<10 mutations/Mb) cases (accuracy: 89.7%; AUC: 0.984). SHAP analysis highlighted clinically recognized biomarkers (MLH1, PMS2, STAT1) and identified novel protein candidates (MTHFD2, MAST4, RPL22L1, MX2, SEC16A). LIME analysis provided individualized prediction interpretations, clarifying each protein biomarker's influence on model decisions.
Conclusion: Our proteomics-driven ML approach demonstrates high accuracy and interpretability in EC subtype classification and TMB prediction. By identifying validated and novel biomarkers, this strategy provides essential biological insights and a strong foundation for the future development of non-invasive diagnostics, personalized treatments, and precision medicine in EC.
Purpose: Peptide-centric machine learning enhanced (PCML) data-independent acquisition tandem mass spectrometry (LC-MS/MS-DIA) matches low-abundance MS fragmentation spectra to in silico predicted peptide spectra deduced from libraries of customized protein sequences. The study's goal was to determine proteomic depth of coverage in microbial pathogen-containing clinical samples using that method.
Experimental design: We employed a published machine learning method based on neural networks (Dia-NN) to the LC-MS/MS analysis of sputum protein digests derived from patients with lung infections.
Results: Nearly 6800 proteins in total and 1530 proteins of microbial origin were identified from single experiments, with CVs of protein quantities among technical replicates as low as 0.12. Conventional spectral library searches of data from these experiments yielded less than 1600 and 60 protein identifications, respectively. Samples of two patients revealed colonization by pathogens difficult to clear from chronically infected lungs, Pseudomonas aeruginosa and Stenotrophomonas maltophilia. Abundant virulence factors in the datasets were the insulin-cleaving metalloproteinase IcmP (P. aeruginosa) and an inducer of human interleukin-10 expression (S. maltophilia). Each bacterium showed signs of adaptation to a hostile milieu, such as the expression of systems to generate energy anaerobically and the acquisition of host-sequestered metals.
Conclusions and clinical relevance: This work constitutes a step forward for protein-centered translational medicine on infectious diseases.
Summary: We demonstrate excellent depth of proteome coverage and experimental repeatability for low-abundance pathogen proteomes in human airway secretions via data-independent acquisition liquid chromatography tandem mass spectrometry leveraging machine learning for spectral analysis. The host's sputum proteome was also profiled, allowing inferences of immune defense mechanisms against pathogens. This proof-of-principle study shows the opportunity to gain insights into respiratory disease burdens and bacterial virulence by directly analyzing clinical specimens and the potential for biomarker discovery and pharmacodynamic response monitoring in interventional studies related to respiratory tract infections.
Purpose: Pulmonary hypertension (PH) is a chronic complication of sickle cell disease (SCD) with limited known biomarkers, beyond increases in plasma brain natriuretic peptide levels.
Experimental design: We conducted a proof-of-concept study to identify serum protein biomarkers that were differentially expressed in SCD patients with elevated tricuspid regurgitation velocity (TRV-a noninvasive marker of PH).
Results: We found 41 out of 92 target proteins that were significantly different between the nonelevated (TRV ≤ 2.6 m/s; N = 35) and highly elevated TRV group (TRV ≥ 2.9 m/s; N = 35, p < 0.05). Six of them passed a Bonferroni correction (p value < 0.0005), including T-cell surface glycoprotein, lymphotactin, SLAM family member 7, galectin-9, TNF-related apoptosis-inducing ligand receptor 2, and tumor necrosis factor receptor superfamily member 11A. We observed up to a 1.2-fold increase in the high TRV group for these six proteins. These six proteins had a strong positive correlation with serum NT-proBNP levels (a positive control marker elevated in PH [r ≥ 0.44]). Additionally, these markers correlated with other clinical parameters of PH in SCD.
Conclusion: Circulatory protein markers of the immune response are increased in SCD patients with elevated TRV as compared to those without elevated TRV.
Summary: This study demonstrates that the circulatory protein markers of the immune response are increased in SCD patients with elevated TRV compared to those without elevated TRV. These biomarkers may be important tools for risk-stratifying patients with SCD or targets for therapeutic intervention.
Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) affects nearly one-fourth of the global population, yet effective diagnostics and treatments remain limited. Systemic immune dysregulation plays a key role in MASLD pathogenesis, highlighting the value of immune profiling.
Methods: In this study, we used high-dimensional single-cell mass cytometry (CyTOF) to analyze peripheral blood mononuclear cells (PBMCs) from healthy donors (n = 6), MASLD patients (n = 4), and MASLD patients treated with an 11β-hydroxysteroid dehydrogenase type 1 (11β-HSD1) inhibitor (n = 2). PBMCs were stained with a 29-marker panel to identify 15 immune cell types and assess cytokine expression.
Results: MASLD patients showed increased CD8⁺ T cells, early NK cells, and monocytes, along with reductions in TH2, TH1, late NK, and Treg cells. Cytokine profiling revealed elevated IL-6 expression in plasmacytoid dendritic cells and late NK cells, indicating systemic inflammation. Automated clustering (PhenoGraph, UMAP) identified NK and phagocytic subsets associated with disease and treatment. Notably, 11β-HSD1 inhibition led to downregulation of pro-inflammatory cytokines (e.g., IFN-γ, IL-6) and partial restoration of immune subsets.
Conclusions: These results offer a high-resolution view of immune alterations in MASLD and suggest that 11β-HSD1 inhibition may represent a promising immunomodulatory therapeutic strategy.
Cyclin-dependent kinase 4/6 inhibitors have transformed hormone receptor (HR)-positive, human epidermal growth factor receptor 2 (HER2)-negative metastatic breast cancer (BC) therapeutics. Ribociclib has been associated with survival gain, yet its potential cardiovascular toxicities (CVTs) remain an area of uncertainty. Our single-center study prospectively recruited adult patients in order to assess treatment-related CVT incidence and spectrum as well as decipher proteins' differential expression in affected patients by data-independent acquisition liquid chromatography-tandem mass spectrometry (DIA LC-MS/MS). After a median follow-up of 27.2 months, five cases of CVT have occurred among the 62 enrolled participants (8.06%; mean age, 67 years). CVTs were in the form of asymptomatic QTc prolongation, transient ischemic attack, deep vein thrombosis, syncope, and pericardial effusion, which developed within 7.56 months. The in-depth proteomics quantified 144 differentially expressed proteins, of which 109 and 35 were down- and up-regulated, respectively, in these five cases (enrolled participants with CVT) compared to five sex- and age-matched controls (enrolled participants without CVT). Negative regulation of endopeptidase activity, phosphatidylcholine metabolism, and immune response were the most affected signaling pathways in the subsequent functional analysis. Large-scale external validation of our hypothesis-generating findings could potentially support individualized cardiovascular prevention in BC patients under ribociclib combinational therapy. SUMMARY: Ribociclib has unequivocally revolutionized hormone-dependent metastatic breast cancer therapeutics. Its potential cardiotoxicity, however, remain inadequately characterized, whereas the underlying pathophysiological mechanisms are poorly understood so far. Our prospective case-control study revealed that despite cardiovascular toxicity was not very common (<10%), its phenotype was not limited to QTc prolongation. Moreover, utilizing mass spectrometry-based serum proteomics, we highlighted for the very first time a number of distinct proteins, which could be of predictive value to identify patients at high risk. The prospective validation of our preliminary, proof-of-concept study's results in larger cohorts could inform optimized preventive strategies.
Objective: Acidic and alkaline enzymes play crucial roles in the food industry and environmental management. This study aims to develop a computational method for accurately distinguishing between acidic and alkaline enzymes to enhance their stability in varying pH environments.
Methods: We employed AutoProp for feature extraction and the MRMD3.0 algorithm for feature selection. The most discriminative feature, the normalized Van der Waals volume (nFeat43), was identified and used for classification.
Results: The selected feature (nFeat43) achieved a classification accuracy of 76.2% in distinguishing acidic from alkaline enzymes. Further analysis was conducted to interpret the physicochemical significance of this feature in enzyme discrimination.
Conclusions: Our findings demonstrate that nFeat43 is a key determinant in differentiating acidic and alkaline enzymes. This method provides a rapid and reliable computational approach for enzyme characterization, which could aid in industrial and environmental applications.
Purpose: Glucocorticoids are widely used for their anti-inflammatory properties, but their specific molecular mechanisms in treating rhegmatogenous retinal detachment with choroidal detachment (RRDCD) remain unclear. This study aims to identify key regulatory factors in the vitreous humor of RRDCD patients and analyze protein changes after hormonal intervention.
Methods: Vitreous fluid samples were collected during surgery from patients with rhegmatogenous retinal detachment (RRD, n = 40), non-glucocorticoid treated RRDCD (nT-RRDCD, n = 35), and glucocorticoid-treated RRDCD (T-RRDCD, n = 32). Primary outcomes were retinal reattachment status and best-corrected visual acuity (BCVA) at 6 months postoperatively. Proteomic analysis was performed using data-independent acquisition (DIA), with differentially expressed proteins validated by parallel reaction monitoring (PRM) and ELISA.
Results: Between RRD and nT-RRDCD, 203 differentially expressed proteins were identified, while 295 proteins were differentially expressed between nT-RRDCD and T-RRDCD. These proteins were involved in complement activation, immune response, blood coagulation, and MAPK signaling. Apolipoprotein D (APOD) and vitronectin (VTN) positively correlated with postoperative BCVA. APOD, serum amyloid A-4 (SAA4), and ubiquitin-conjugating enzyme E2 variant emerged as potential diagnostic biomarkers for RRDCD.
Conclusions: RRDCD development involves multiple factors. Glucocorticoids mitigate retinal damage by suppressing inflammation, regulating oxidative stress, and promoting cell repair. APOD and VTN correlate with BCVA, while APOD, SAA4, and ubiquitin-conjugating enzyme E2 show promise as diagnostic biomarkers for RRDCD.

