Data mining antibody sequences for database searching in bottom-up proteomics

Immunoinformatics (Amsterdam, Netherlands) Pub Date : 2024-09-01 Epub Date: 2024-08-22 DOI:10.1016/j.immuno.2024.100042

Xuan-Tung Trinh , Rebecca Freitag , Konrad Krawczyk , Veit Schwämmle

{"title":"Data mining antibody sequences for database searching in bottom-up proteomics","authors":"Xuan-Tung Trinh , Rebecca Freitag , Konrad Krawczyk , Veit Schwämmle","doi":"10.1016/j.immuno.2024.100042","DOIUrl":null,"url":null,"abstract":"<div><p>Mass spectrometry-based proteomics facilitates the identification and quantification of thousands of proteins but encounters challenges in measuring human antibodies due to their vast diversity. Bottom-up proteomics methods primarily rely on database searches, comparing experimental peptide values to theoretical database sequences. While the human body can produce millions of distinct antibodies, current databases, such as UniProtKB/Swiss-Prot, contain only 1095 sequences (as of January 2024), potentially hindering antibody identification via mass spectrometry. Therefore, expanding the database is crucial for discovering new antibodies. Recent genomic studies have amassed millions of human antibody sequences in the Observed Antibody Space (OAS) database, yet this data remains underutilized. Leveraging this vast collection, we conduct efficient database searches in publicly available proteomics data, focusing on SARS-CoV-2. In our study, thirty million heavy antibody sequences from 146 SARS-CoV-2 patients in the OAS database were digested <em>in silico</em> to obtain 18 million unique peptides. These peptides form the basis for new bottom-up proteomics databases. We used those databases for searching new antibody peptides in publicly available SARS-CoV-2 human plasma samples in the Proteomics Identification Database (PRIDE). This approach avoids false positives in antibody peptide identification as confirmed by searching against negative controls (brain samples) and employing different database sizes. We show that new antibody peptides were found in previous plasma samples and expect that the newly discovered antibody peptides can be further employed to develop therapeutic antibodies. The method will be broadly applicable to find characteristic antibodies for other diseases.</p></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"15 ","pages":"Article 100042"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667119024000120/pdfft?md5=6bc5ac01ada92397791db50d32ef768f&pid=1-s2.0-S2667119024000120-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119024000120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Mass spectrometry-based proteomics facilitates the identification and quantification of thousands of proteins but encounters challenges in measuring human antibodies due to their vast diversity. Bottom-up proteomics methods primarily rely on database searches, comparing experimental peptide values to theoretical database sequences. While the human body can produce millions of distinct antibodies, current databases, such as UniProtKB/Swiss-Prot, contain only 1095 sequences (as of January 2024), potentially hindering antibody identification via mass spectrometry. Therefore, expanding the database is crucial for discovering new antibodies. Recent genomic studies have amassed millions of human antibody sequences in the Observed Antibody Space (OAS) database, yet this data remains underutilized. Leveraging this vast collection, we conduct efficient database searches in publicly available proteomics data, focusing on SARS-CoV-2. In our study, thirty million heavy antibody sequences from 146 SARS-CoV-2 patients in the OAS database were digested in silico to obtain 18 million unique peptides. These peptides form the basis for new bottom-up proteomics databases. We used those databases for searching new antibody peptides in publicly available SARS-CoV-2 human plasma samples in the Proteomics Identification Database (PRIDE). This approach avoids false positives in antibody peptide identification as confirmed by searching against negative controls (brain samples) and employing different database sizes. We show that new antibody peptides were found in previous plasma samples and expect that the newly discovered antibody peptides can be further employed to develop therapeutic antibodies. The method will be broadly applicable to find characteristic antibodies for other diseases.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自下而上蛋白质组学数据库搜索抗体序列的数据挖掘

以质谱为基础的蛋白质组学有助于识别和量化成千上万的蛋白质，但由于人类抗体种类繁多，在测量人类抗体时遇到了挑战。自下而上的蛋白质组学方法主要依靠数据库搜索，将实验肽值与理论数据库序列进行比较。虽然人体可以产生数百万种不同的抗体，但目前的数据库（如 UniProtKB/Swiss-Prot）只包含 1095 个序列（截至 2024 年 1 月），可能会妨碍通过质谱鉴定抗体。因此，扩大数据库对发现新抗体至关重要。最近的基因组研究在观察抗体空间（OAS）数据库中积累了数百万个人类抗体序列，但这些数据仍未得到充分利用。利用这个庞大的数据库，我们在公开的蛋白质组学数据中进行了高效的数据库搜索，重点是 SARS-CoV-2 。在我们的研究中，我们对 OAS 数据库中来自 146 名 SARS-CoV-2 患者的 3,000 万个重抗体序列进行了硅消化，获得了 1,800 万个独特的肽段。这些肽构成了新的自下而上蛋白质组学数据库的基础。我们利用这些数据库在蛋白质组学鉴定数据库（PRIDE）中公开的 SARS-CoV-2 人类血浆样本中搜索新的抗体肽。通过与阴性对照（脑样本）进行搜索和使用不同大小的数据库，这种方法避免了抗体肽鉴定中的假阳性。我们发现在以前的血浆样本中发现了新的抗体肽，并期望新发现的抗体肽能进一步用于开发治疗性抗体。该方法将广泛应用于寻找其他疾病的特征抗体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Immunoinformatics (Amsterdam, Netherlands) Immunology, Computer Science Applications

自引率

0.00%

发文量

审稿时长

60 days